Ollama Supports Llama 3.2 Vision: Talk to ANY Image 100% Locally!

Поделиться
HTML-код
  • Опубликовано: 4 фев 2025

Комментарии • 26

  • @angelochu3156
    @angelochu3156 2 месяца назад +1

    May I know what's the spec of your mac to run this 11B vision model?

  • @JemQuadri
    @JemQuadri 2 месяца назад

    Sir I have a question, say you want to train a pre existing vision model to generate image based off of input (video), which model would you go for and how would you tweak it? I look forward to your response

  • @jtjames79
    @jtjames79 2 месяца назад +4

    Have to figure out a way to get cameras to work on NPCs in Skyrim. That way they can comment on my giant... umm... sword.

  • @Lutfor_R_Sohan
    @Lutfor_R_Sohan 2 месяца назад

    can you please make a video on how to run g4f locally? I tried but failed..

  • @sambarseghyan6412
    @sambarseghyan6412 2 месяца назад

    when I use a command line, everything works and analyzes the image. But if I use UI, it won't upload the image - it can't see the image to analyze for some reason. any help would be greatly appreciated

  • @dhanush4132
    @dhanush4132 2 месяца назад

    can i run the 90b it on my mac m1 without any heating issues or without any issues? please some one answer me

    • @KasparBredahlRasmussen-rn5zq
      @KasparBredahlRasmussen-rn5zq 2 месяца назад +1

      You should properly have a max specked out M1 ultra to do that. 11b is more realistic but would properly need a nice amount of Ram. I had 8b models running on my m2 with 16gb. It is ok but not fast.

  • @stevethompson210
    @stevethompson210 2 месяца назад +1

    What hardware would be needed to run the 90b model?

    • @MervinPraison
      @MervinPraison  2 месяца назад +5

      Llama 3.2 Vision 11B requires least 8GB of VRAM, and the 90B model requires at least 64 GB of VRAM.

    • @stevethompson210
      @stevethompson210 2 месяца назад

      @@MervinPraison. Thanks for this information and for all your videos!

    • @TheJGAdams
      @TheJGAdams 2 месяца назад

      @@MervinPraison How do one get 64GB of VRAM? No gaming card has that much space.

    • @ATaleTold
      @ATaleTold 10 дней назад

      @@TheJGAdams A Macintosh uses computer its machine memory in conjunction with the GPU's RAM (or, in place of it), unlike a non-Mac PC. Google "What is Unified Memory on a Mac."

  • @cissemy
    @cissemy 2 месяца назад

    Is there any model to recognize a form(extract data from a pdf form) ?

    • @houstonfirefox
      @houstonfirefox 2 месяца назад

      A PDF form many times can be represented by an image. To test this, I first tried to get Vision to recognize a PDF version of a receipt - it failed. I then opened the PDF in a PDF viewer and saved it as a PNG file. Vision then was correctly able to ferret out all of the details very accurately! I'm not sure how a multi-page PDF would be handled, perhaps with multiple calls, one for each page. It IS possible!

    • @cissemy
      @cissemy 2 месяца назад

      @@houstonfirefox
      It is one page form. Can you help ?

  • @stevethompson210
    @stevethompson210 2 месяца назад +1

    How do you know for sure that it is not sending your data out to servers?

    • @not_the_lil_prince
      @not_the_lil_prince 2 месяца назад +2

      Cuz you can use it without an internet connection

    • @motivation_guru_93
      @motivation_guru_93 2 месяца назад

      Really ?

    • @not_the_lil_prince
      @not_the_lil_prince 2 месяца назад

      @@motivation_guru_93 yes

    • @not_the_lil_prince
      @not_the_lil_prince 2 месяца назад

      @@motivation_guru_93 yes

    • @stevethompson210
      @stevethompson210 2 месяца назад

      @@not_the_lil_prince But how do you know that Zuch doesn’t have it upload your data the next time you connect to WiFi?
      If we want to use it for work, we can’t find out down the road that it has actually been uploading your data.

  • @moszis
    @moszis 2 месяца назад

    Do you know of any model that would take in video clips in a similar way?

    • @MervinPraison
      @MervinPraison  2 месяца назад +1

      You can take screen shot of each frame and pass it to LLM , alternatively

    • @moszis
      @moszis 2 месяца назад

      @@MervinPraison thank you. I was thinking about that. its possible to break the video into frames directly, no need for screenshots. Would need to get creative with prompts to group the responses into "scenes" and derive appropriate context.

  • @JNET_Reloaded
    @JNET_Reloaded 2 месяца назад

    its crap it thinks every image of a person is an Australian president!