PaliGemma by Google: Inference and Fine Tuning of Vision Language Model

Поделиться
HTML-код
  • Опубликовано: 14 май 2024
  • In this video I'm diving deep into PaliGemma, a new vision language model by Google! PaliGemma can analyze images and text, making it super versatile for tasks like image captioning and question answering. I'll show you how to use this powerful tool and get the most out of it through fine-tuning.
    Don't forget to like and subscribe for more tech breakdowns!
    Notebook: github.com/AIAnytime/PaliGemm...
    PaliGemma HF: huggingface.co/collections/go...
    Join this channel to get access to perks:
    / @aianytime
    To further support the channel, you can contribute via the following methods:
    Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
    UPI: sonu1000raw@ybl
    #google #ai #openai
  • НаукаНаука

Комментарии • 21

  • @TaHa-nf5vc
    @TaHa-nf5vc 2 месяца назад

    Bro i love your channel, your videos are of high quality and so instructive.
    And that hairstyle, clearly DOPE, i personnally think its the one :D

  • @SravanKumar-cj4uu
    @SravanKumar-cj4uu Месяц назад

    Thank you for your detailed explanation. Your classes are quite interesting and are building confidence to move further forward. I need some suggestions: I saw a medical chatbot using Llama 2 on a CPU machine, which was all open source. Similarly, I need to build an image-to-text multimodal model on a CPU using all open-source tools. Please provide your suggestions.

  • @robinchriqui2407
    @robinchriqui2407 Месяц назад

    Hi thank you very much, is it the same kind of process for any vlm model on hugging face?

  • @karthiksundaram544
    @karthiksundaram544 2 месяца назад

  • @latentbhindi837
    @latentbhindi837 Месяц назад +1

    Great vid!
    also united are gonna bottle the FA cup xd.

  • @Mesenqe
    @Mesenqe Месяц назад +1

    Thank you for the tutorial. I have one question: How can we use our own fine-tuned model on inference time? Can you make a video on how to use our own fine-tuned PaliGemma model during inference or if you can suggest links to read. Thank you.

    • @clawbro
      @clawbro 27 дней назад

      Exactly I have the same issue too, I cant use it and save_pretrained is not working

  • @souravbarua3991
    @souravbarua3991 Месяц назад

    Please make a video on multimodal/visionLM with 'video data'. In place of the image it takes the video as input.

  • @astheticsouls7770
    @astheticsouls7770 2 месяца назад

    can Pali Gemma good for RAG?

  • @ricorauschkolb2801
    @ricorauschkolb2801 2 месяца назад +3

    Is the model also good for OCR tasks?

    • @miguelalba2106
      @miguelalba2106 Месяц назад

      You need to fine tune it to achieve good results, it is a good basis for any visual understanding task

  • @JokerJarvis-cy2sw
    @JokerJarvis-cy2sw 2 месяца назад

    Sir can I use this in my local machine or in raspberry pi coz I want to make a robot via raspberry pi
    If not can you please suggest me any alternative if not locally then via API (free)

  • @chongdashu
    @chongdashu Месяц назад +2

    > processor = PaliGemmaProcessor(model_id)
    Give the following errors:
    90 raise ValueError("You need to specify an `image_processor`.")
    91 if tokenizer is None:
    92 raise ValueError("You need to specify a `tokenizer`.")
    93 if not hasattr(image_processor, "image_seq_length"):
    94 raise ValueError("Image processor is missing an `image_seq_length` attribute.")
    Should be PaliGemmaProcessor.from_pretrained(model_id)

  • @barderino5673
    @barderino5673 2 месяца назад

    i still have confusion on why targetting q, o, k, v, gate , up , down ....targetting all linear layer ? why all ?

    • @nurusterling8024
      @nurusterling8024 Месяц назад +1

      Research shows that this is the closest to full fine-tuning in terms of performance

  • @MegaClockworkDoc
    @MegaClockworkDoc Месяц назад +1

    You put a lot of effort into this video, but your audio is terrible.

    • @AIAnytime
      @AIAnytime  Месяц назад

      Will improve in future videos...

    • @rizzlr
      @rizzlr Месяц назад

      @@AIAnytime could use ai to improve it too