Your tutorials are so helpful for me. Also, simply subscribing to you keeps me updated on new AI releases and tools. I learned about Flowise and Langflow from you. I also learned about the release of Llama 3.2 Vision (this video!) from you as well! Thanks!
tried both 11b and 90b models. the 11b seems to be uncensored while the 90b is censored (first shortfall…); on top of that it looks like multimodal models cannot support parallel streams of actions (i.e. extract info from an img via OCR and then perform a search on the extracted contents); last but not least, they seem to be able to process only 1 img at a time….The resulting capabilities appear to be far behind “commercial” models. Unfortunately. Does anybody know if an uncensored version of a decent vision-enabled LLM has already been created?
I would love to see a NextJS app with Ollama. Cherry on top would be agents looking into the images and categorizing them or something. Thank you for your amazing content @Leon
@ Thanks Again Leon. Another idea for video is can we call one Agentflow/Chatflow from another Agentflow/Chatflow. The rationale behind this is to break complex flows into smaller flows.
Which LLM can read pdf including mathematical functions shown in classical way (for example having integration symbols, dividing lines, squareroot symbols, etc.).I would appreciate your answer.
@@leonvanzyl Oh yes! I have a cool multi AI agent chatroom running with an admin backend to control their convo and humans can partake in the chat. seriously believable chat agents and al running off Llama3.2 Instruct 3B and LM studio. Cheers on the content and subbed!
Thank you guys for the incredible support!
Remember to like and subscribe to help this channel out 🙏
Leon I need your expertise good sir
Your tutorials are so helpful for me. Also, simply subscribing to you keeps me updated on new AI releases and tools. I learned about Flowise and Langflow from you. I also learned about the release of Llama 3.2 Vision (this video!) from you as well! Thanks!
That's awesome to hear. Thank you
Thank you for helping us getting started. You saved me quite some time on how to enhance my local llama with vision capabilities.
You're welcome 🤗
super helpful. its quite incredible how lacking in notes and examples these models are when released, so thanks very much.
I know!
very useful Leon, like everything you are posting! the best channel on the subject
Thank you!
tried both 11b and 90b models. the 11b seems to be uncensored while the 90b is censored (first shortfall…); on top of that it looks like multimodal models cannot support parallel streams of actions (i.e. extract info from an img via OCR and then perform a search on the extracted contents); last but not least, they seem to be able to process only 1 img at a time….The resulting capabilities appear to be far behind “commercial” models. Unfortunately. Does anybody know if an uncensored version of a decent vision-enabled LLM has already been created?
I would love to see a NextJS app with Ollama. Cherry on top would be agents looking into the images and categorizing them or something. Thank you for your amazing content @Leon
Awesome suggestion
Ty~ this will help greatly, was so tired of copypasting from a terminal lol
You're welcome 😁
nice! Nextjs ollama client would be really cool to see. i also wonder how good this model is with web design - convert design to code
Ooh, interesting idea.
Thanks again for the great tutorial Leon. Please create a Next.js app.
Will do
@ Thanks Again Leon. Another idea for video is can we call one Agentflow/Chatflow from another Agentflow/Chatflow. The rationale behind this is to break complex flows into smaller flows.
Which LLM can read pdf including mathematical functions shown in classical way (for example having integration symbols, dividing lines, squareroot symbols, etc.).I would appreciate your answer.
Thanks for the video, I would like to see another video tutorial with a STREAMLIT
Will do
Hi Leon, How much VRAM do you have on your computer to run this 9B vision model?
Thank you, like to see integrate this model using Flowise ❤soon
Oh, trust me. I'll definitely create a FW video on this
Yes yes please build the app Leon !!!
Python, JS or both? 😁
@leonvanzyl I'm sure alot of people want to see both lol !!
NextJS please
I hope this will be usable with LM studio eventually. Hello from a fellow SAfrican xD
Howzit!
I seriously need to create LMStudio videos as well
@@leonvanzyl Oh yes! I have a cool multi AI agent chatroom running with an admin backend to control their convo and humans can partake in the chat. seriously believable chat agents and al running off Llama3.2 Instruct 3B and LM studio. Cheers on the content and subbed!
Does Flowise allow us to use this with ollama chat?
Not yet, but I think they'll release the feature SOON. Will create a video on it as soon as it's available.
Awesome! These GPU specs are restrictive though…I love openwebui but it runs like crap on older systems….
I would think that it would be similar to the terminal, no?
Can we upload rar or zip files containing visual studio projects to write code? Can it read these files like chatgpt-4o ?
What are minimum hardware requirements to get this running smoothly
The image / multi modal models are resource intensive.
I have a laptop with an RTX 4070 and the responses were not too slow.
@leonvanzyl what about mac ?
I want to buy mac for mL
I installed 2 days ago but today im getting the following problem: zsh: command not found: ollama
why all these vision model cant never ocr? I will think understanding text is easier.
Can we chat with multiple images?
Only one at a time
@leonvanzyl hey I have a use case where I want to chat with multiple images do you have any suggestion ?
Primer comentario
Haha
😎