Sir I have a question, say you want to train a pre existing vision model to generate image based off of input (video), which model would you go for and how would you tweak it? I look forward to your response
when I use a command line, everything works and analyzes the image. But if I use UI, it won't upload the image - it can't see the image to analyze for some reason. any help would be greatly appreciated
You should properly have a max specked out M1 ultra to do that. 11b is more realistic but would properly need a nice amount of Ram. I had 8b models running on my m2 with 16gb. It is ok but not fast.
@@TheJGAdams A Macintosh uses computer its machine memory in conjunction with the GPU's RAM (or, in place of it), unlike a non-Mac PC. Google "What is Unified Memory on a Mac."
A PDF form many times can be represented by an image. To test this, I first tried to get Vision to recognize a PDF version of a receipt - it failed. I then opened the PDF in a PDF viewer and saved it as a PNG file. Vision then was correctly able to ferret out all of the details very accurately! I'm not sure how a multi-page PDF would be handled, perhaps with multiple calls, one for each page. It IS possible!
@@not_the_lil_prince But how do you know that Zuch doesn’t have it upload your data the next time you connect to WiFi? If we want to use it for work, we can’t find out down the road that it has actually been uploading your data.
@@MervinPraison thank you. I was thinking about that. its possible to break the video into frames directly, no need for screenshots. Would need to get creative with prompts to group the responses into "scenes" and derive appropriate context.
May I know what's the spec of your mac to run this 11B vision model?
Sir I have a question, say you want to train a pre existing vision model to generate image based off of input (video), which model would you go for and how would you tweak it? I look forward to your response
Have to figure out a way to get cameras to work on NPCs in Skyrim. That way they can comment on my giant... umm... sword.
can you please make a video on how to run g4f locally? I tried but failed..
when I use a command line, everything works and analyzes the image. But if I use UI, it won't upload the image - it can't see the image to analyze for some reason. any help would be greatly appreciated
can i run the 90b it on my mac m1 without any heating issues or without any issues? please some one answer me
You should properly have a max specked out M1 ultra to do that. 11b is more realistic but would properly need a nice amount of Ram. I had 8b models running on my m2 with 16gb. It is ok but not fast.
What hardware would be needed to run the 90b model?
Llama 3.2 Vision 11B requires least 8GB of VRAM, and the 90B model requires at least 64 GB of VRAM.
@@MervinPraison. Thanks for this information and for all your videos!
@@MervinPraison How do one get 64GB of VRAM? No gaming card has that much space.
@@TheJGAdams A Macintosh uses computer its machine memory in conjunction with the GPU's RAM (or, in place of it), unlike a non-Mac PC. Google "What is Unified Memory on a Mac."
Is there any model to recognize a form(extract data from a pdf form) ?
A PDF form many times can be represented by an image. To test this, I first tried to get Vision to recognize a PDF version of a receipt - it failed. I then opened the PDF in a PDF viewer and saved it as a PNG file. Vision then was correctly able to ferret out all of the details very accurately! I'm not sure how a multi-page PDF would be handled, perhaps with multiple calls, one for each page. It IS possible!
@@houstonfirefox
It is one page form. Can you help ?
How do you know for sure that it is not sending your data out to servers?
Cuz you can use it without an internet connection
Really ?
@@motivation_guru_93 yes
@@motivation_guru_93 yes
@@not_the_lil_prince But how do you know that Zuch doesn’t have it upload your data the next time you connect to WiFi?
If we want to use it for work, we can’t find out down the road that it has actually been uploading your data.
Do you know of any model that would take in video clips in a similar way?
You can take screen shot of each frame and pass it to LLM , alternatively
@@MervinPraison thank you. I was thinking about that. its possible to break the video into frames directly, no need for screenshots. Would need to get creative with prompts to group the responses into "scenes" and derive appropriate context.
its crap it thinks every image of a person is an Australian president!