🎯 Key Takeaways for quick navigation: 00:00 *Introduction and mention of Microsoft's new open-source Vision model, Phi-3 Vision.* 00:13 *Phi-3 Vision is part of Microsoft's Phi-3 model family, announced at the Microsoft Build 2024 conference.* 00:38 *Phi-3 Vision is a 4.2 billion parameter multimodal model supporting a 128k context limit for long conversations.* 02:24 *Trained on 500 billion tokens with 512 Nvidia H100 GPUs, focusing on high-quality reasoning data.* 02:54 *Outperforms GPT-4 Vision in multiple benchmarks despite its smaller size.* 03:46 *Beats larger models in the MM Bench and Science QA benchmarks, demonstrating impressive performance.* 04:55 *Consistently outperforms Claude 3 and Gemini models in various benchmarks.* 06:37 *Available for use on Hugging Face and Azure AI Studio, with potential for on-device inference.* 09:29 *Initial testing shows it excels at code generation and data conversion tasks but struggles with more complex reasoning questions.* 13:44 *Overall, Phi-3 Vision is a promising model for lightweight AI applications, performing well even on mobile devices.* Made with HARPA AI
Could I run inferences with this model on every image of a 200k frame video, just on my CPU? It’s currently fine for my own OCR model that’s trained on 3000 images. Or do I need a GPU.
The image description function is too tightly wound ... almost all public domain fine art examples were simply "unable to describe as contains inappropriate material". I couldn't summarize an A&E afternoon special with this! Not trolling. I have yet to try the other phi3 use cases. I have heard that the LLM is god for function handling (not sure of direct calling) with the help of "Guidance" from MS. I am working on media production and these are my two most useful task cases.
I just did some testing and it seems this really sucks because it always forces any suggestions for software be PAID AND BY MICROSOFT. Go figure... but total BS if you want to ask it actual questions since it will force itself to give you pro-microsoft answers only and block competitors. No matter how many times I said free, it would give suggestions for paid microsoft products:/ Sticking with llama and mistral!
I may have to hit the playground and check this out... for defense or exploit... every bug is a feature... this is and always has been the MS philosophy...
Thanks!
Thanks for the support! This makes all my work worth it! Thanks for the appreciation.
@@AICodeKing How much of a donation needed for the AI voice to be at normal speaking speed and a not so `creepy stalker` sounding? :)
@@NathanChambers i like the voice, perhaps a bidding war?
🎯 Key Takeaways for quick navigation:
00:00 *Introduction and mention of Microsoft's new open-source Vision model, Phi-3 Vision.*
00:13 *Phi-3 Vision is part of Microsoft's Phi-3 model family, announced at the Microsoft Build 2024 conference.*
00:38 *Phi-3 Vision is a 4.2 billion parameter multimodal model supporting a 128k context limit for long conversations.*
02:24 *Trained on 500 billion tokens with 512 Nvidia H100 GPUs, focusing on high-quality reasoning data.*
02:54 *Outperforms GPT-4 Vision in multiple benchmarks despite its smaller size.*
03:46 *Beats larger models in the MM Bench and Science QA benchmarks, demonstrating impressive performance.*
04:55 *Consistently outperforms Claude 3 and Gemini models in various benchmarks.*
06:37 *Available for use on Hugging Face and Azure AI Studio, with potential for on-device inference.*
09:29 *Initial testing shows it excels at code generation and data conversion tasks but struggles with more complex reasoning questions.*
13:44 *Overall, Phi-3 Vision is a promising model for lightweight AI applications, performing well even on mobile devices.*
Made with HARPA AI
Learn to communicate, or is AI going to do that for you instead from now on?
there cannot be a better summary than this
Here's the rephrased sentence:
Artificial intelligence is advancing rapidly, and I lack the resources to evaluate every available model (No Time)
Could I run inferences with this model on every image of a 200k frame video, just on my CPU? It’s currently fine for my own OCR model that’s trained on 3000 images. Or do I need a GPU.
I think you can do that.
Which app are u using gpt-4o in?
Can one access Phi 3 online without locally running the software in one's computer?
Yes. You can access it for free on Azure AI Studio and HuggingFace chat
The image description function is too tightly wound ... almost all public domain fine art examples were simply "unable to describe as contains inappropriate material". I couldn't summarize an A&E afternoon special with this! Not trolling. I have yet to try the other phi3 use cases. I have heard that the LLM is god for function handling (not sure of direct calling) with the help of "Guidance" from MS. I am working on media production and these are my two most useful task cases.
I just did some testing and it seems this really sucks because it always forces any suggestions for software be PAID AND BY MICROSOFT. Go figure... but total BS if you want to ask it actual questions since it will force itself to give you pro-microsoft answers only and block competitors. No matter how many times I said free, it would give suggestions for paid microsoft products:/ Sticking with llama and mistral!
I may have to hit the playground and check this out... for defense or exploit... every bug is a feature... this is and always has been the MS philosophy...
Great info👍🏿
maybe it is good i science because they optimized it for Khan AI? to be the assistants for usa teachers?
Nice video, thanks! Do you also provide the code for the demo you showed? @AICodeKing