Apple Ferret a Multimodal LLM: The First Comprehensive Guide (Quick Demo with steps)

Поделиться
HTML-код
  • Опубликовано: 28 ноя 2024

Комментарии • 38

  • @JarvislabsAI
    @JarvislabsAI  10 месяцев назад +1

    Here's the steps to follow to set it on a JarvisLabs instance: gist.github.com/svishnu88/ec6b0e5a76649ab7a04ab2f613355340

    • @BoominGame
      @BoominGame 9 месяцев назад

      (FYI Head is made with a fennel.)

  • @Jasonknash101
    @Jasonknash101 10 месяцев назад +2

    If this runs on my phone and can work with shortcuts and siri could be a killer app... in the making.

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад

      Quite possible in the near future.

    • @kendrickpi
      @kendrickpi 10 месяцев назад

      Their (Apple’s) ‘Knowledge Navigator’ device is long over due. I hope that this will teach me languages better and faster - with speech - it teaching my the sounds and word's by speaking and listening to me - the human race able to converse with others in many languages may drive peace and understanding. The wolf we feed thrives. Let’s feed the good wolf!

  • @MengshiCen
    @MengshiCen 5 месяцев назад

    I read "Then download LLaVA's first-stage pre-trained projector weight" in the readme. Where does this go?

  • @ArimaShukla-bt7yw
    @ArimaShukla-bt7yw 9 месяцев назад +1

    HI thank you for your video. it is very nice. I have a question regarding my MAC device. I have Macbook Pro 2018 model which has Intel Iris Plus Graphics 655 1536MB. I have tried the steps and it gives me error of ""git lfs is not installed. please install and run git lfs install followed by git lfs pull in the cloned repository." However the same is installed. I have tried debugging but no luck. Can you suggest something?

    • @JarvislabsAI
      @JarvislabsAI  9 месяцев назад

      Never tried installing LFS on mac. If git LFS does not work, you can skip those commands and directly put the weights in the proper location.

  • @everlasts
    @everlasts 10 месяцев назад +2

    Thanks for the demo, is that possible to run and train the Ferret model locally on a M3 Max with 128 gb ram?

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад +2

      Training on M3 wont be possible, or very hard. Apple team has used 8 Nvidia A100 😀. Inference should happen though.

  • @gthin
    @gthin 7 месяцев назад

    Hey Vishnu, thanks for the simple explanations. I'm a designer not a dev,, but seeing this I have a doubt. Does this Ferret model work the same way 'circle something on screen in latest Samsung galaxy phones to search' ? Ultimately it recognises on what we pointed, drawn a box or sketch..right?

    • @JarvislabsAI
      @JarvislabsAI  7 месяцев назад

      Yeah thats right. I have not used Samsung, but the model works in a similar way.

  • @nishantroy3284
    @nishantroy3284 9 месяцев назад

    Really great video 👏. Just have two very quick question can we try and run this on m1 air MacBook also any other non apple machine? Huge help 🙏 if possible.

    • @JarvislabsAI
      @JarvislabsAI  9 месяцев назад

      I think there is growing compatibility for Mac m1 and m2 chips. I have not tried it personally. You may find this interesting github.com/Mozilla-Ocho/llamafile

    • @nishantroy3284
      @nishantroy3284 9 месяцев назад

      @@JarvislabsAI thank you for the suggestions.

  • @codeplaywatch
    @codeplaywatch 10 месяцев назад

    Hey thank you for great video. It's little confusing which version of vicuna I should download. You said it's v1.3 but you're showing v1.5 on huggingface. Again great video !

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад +1

      Yes thats right, we should be using huggingface.co/lmsys/vicuna-7b-v1.3. Also I added the steps here gist.github.com/svishnu88/ec6b0e5a76649ab7a04ab2f613355340

  • @johnt2491
    @johnt2491 10 месяцев назад

    OK that image you're using is just scary LOL 😂 😂 😂

  • @ItayVerkh
    @ItayVerkh 10 месяцев назад

    Can you do a follow up video on how to actually create a useful checkpoint? what you've shown is only the base model and it doesn't output the object (like [obj1], [obj]2) nor a picture with bounding boxes. The weights you loaded are just the base model and to create what they show in the paper you need to fine tune on LLava data

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад

      When you run these steps
      python3 -m ferret.model.apply_delta \
      --base /home/models/vicuna-7b-v1.3 \
      --target /home/models/ferret-7b-v1-3 \
      --delta /home/models/ferret-7b-delta
      It creates the final weights.
      On the right handside bottom, there is a show location button. Clicking it would show the boxes on the image.

  • @enggm.alimirzashortclipswh6010
    @enggm.alimirzashortclipswh6010 9 месяцев назад

    I've been trying to sign in to JL, stuck in receiving mobile verification step.

    • @JarvislabsAI
      @JarvislabsAI  9 месяцев назад

      If the issue is still not resolved, can you please ping us on the chat in the website. Or drop an email to hello@jarvislabs.ai

  • @vpsirsirvp
    @vpsirsirvp 10 месяцев назад +1

    thank you for your video, i'm stucked at checkpoints step. i downloaded vicuna 1.3 and ferret delta as per your video. there is no ferret 1.3, i suppose it's being created once you have vicuna 1.3 and ferret delta. then i ran the code by changing to the right directory. then i got an error called "_pickle.UnpicklingError: invalid load key, 'v'." can you please give more guidance? it seems this part is missing. thank you!

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад

      Hi the key step is to create a separate environment and run the steps inside each environment. Once the weights are downloaded, you need to run the below steps.
      python3 -m ferret.model.apply_delta \
      --base /home/models/vicuna-7b-v1.3 \
      --target /home/models/ferret-7b-v1-3 \
      --delta /home/models/ferret-7b-delta
      Adjust the locations based on your needs.

    • @vpsirsirvp
      @vpsirsirvp 10 месяцев назад

      @@JarvislabsAI i created an environment to run all the previous steps as per mentioned in your video. then i put vicuna-7b-v1.3 and ferret-7b-delta to a folder under models. i even created a new empty folder called ferret-7b-v1-3 under models folder. once i ran the below code, i got the error invalid load key "v"
      python3 -m ferret.model.apply_delta \
      --base /home/models/vicuna-7b-v1.3 \
      --target /home/models/ferret-7b-v1-3 \
      --delta /home/models/ferret-7b-delta
      i've already adjusted the location based on my environment.
      can you give more hints???????????????????????? this part is not being covered in your video.

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад

      Can you ping on the chat in the website, We can get on a call. I can help you debug.

    • @vpsirsirvp
      @vpsirsirvp 10 месяцев назад

      @@JarvislabsAI it seems that we need to train ferret-7b-v1-3 model first?

    • @JarvislabsAI
      @JarvislabsAI  10 месяцев назад

      Not required, you need to use the delta to calculate the model. I am putting up the exact steps to recreate. Will add it to the description in a while.

  • @enggm.alimirzashortclipswh6010
    @enggm.alimirzashortclipswh6010 9 месяцев назад

    there are three models in the command
    python3 -m ferret.model.apply_delta \
    --base ./model/vicuna-7b-v1-3 \
    --target ./model/ferret-7b-v1-3 \
    --delta path/to/ferret-7b-delta
    I have base and delta model, where do I find the target model???

    • @JarvislabsAI
      @JarvislabsAI  9 месяцев назад

      You can use the apple ferret in jarvislabs, we have added all the required weights. So that it works out of the box. If you face any challenges you can ping us in the chat on the website.
      jarvislabs.ai/docs/apple-ferret

    • @enggm.alimirzashortclipswh6010
      @enggm.alimirzashortclipswh6010 9 месяцев назад

      @@JarvislabsAI I extracted the model from the github repo, now I have the model, tokenizer, image_processor and context_length. when I call the model in my notebook and pass just text, it responds well, I am facing issue with the image-based input format to the model. I am using model.generate(input_ids, max_new_tokens=200)