The answer from mixtral medium "ball is lost during transit" doesn't necessarily mean that the ball is lost in the box during shipping, it could be lost during transit between the moment you put the ball in the bag and the moment you put the bag in the box. imho, the model got it right, the human just didn't interpret the result correctly. And the gpt4 answer you labeled as 'perfect' could be wrong as well. Depending on how the bag was tilted, the ball wouldn't have fallen out of the bag. I feel like the mistral-medium answer was the most accurate one.
🎯 Key Takeaways for quick navigation: 00:00 🚀 *Overview and Platform Introduction* - Mistol AI API access for testing compared to GPT-3.5 and GPT-4. - Introduction to Mistol AI platform features, including models, streaming options, and safe mode. - Pricing overview and initial impressions of Mistol AI's competitiveness. 01:24 💰 *Pricing Comparison* - Detailed pricing calculations for Mistol AI's medium and small models. - Competitive pricing compared to GPT-3.5 Turbo. - Ready to proceed with testing after pricing analysis. 02:21 🧠 *Testing Scenarios Introduction* - Explanation of the three testing scenarios: Shirt problem, World model problem, and Python Snake game. - Description of the reasoning and coding challenges posed to Mistol AI models. 04:21 🤖 *Testing GPT-3.5 on the Shirt Problem* - Quick test of GPT-3.5 on the Shirt problem. - GPT-3.5's incorrect response and analysis of the mistake. - Setting the stage for Mistol AI's response to the same problem. 05:17 👕 *Testing Mistol Small Model on the Shirt Problem* - Mistol Small Model's correct response to the Shirt problem. - Highlighting Mistol's ability to understand parallel processing in the problem. - Confidence in Mistol's capability based on the small model's performance. 06:08 🌍 *Testing Mistol Medium Model on the World Problem* - Introduction to the World model problem. - GPT-3.5's incorrect response to the World problem. - Preparing to test both Mistol Small and Medium models on the same problem. 07:29 🌐 *Testing Mistol Small and Medium Models on the World Problem* - Mistol Small Model's response and analysis. - Mistol Medium Model's response and analysis. - Comparison with GPT-4's accurate response to the World problem. 08:51 🐍 *Testing Python Snake Game - Mistol Small Model* - Implementing and testing Python Snake Game code using Mistol Small Model. - Evaluation of the generated code's quality. - Comparison with Mistol's response to other models. 10:31 🎮 *Testing Python Snake Game - Mistol Medium Model* - Implementing and testing Python Snake Game code using Mistol Medium Model. - Evaluation of the generated code's quality and UI. - Comparison with Mistol Small Model and GPT-4's responses. 11:42 🔄 *Testing Streaming Function - Mistol Tiny, Small, and Medium Models* - Introduction to streaming functionality on Mistol AI. - Quick streaming test on Mistol Tiny, Small, and Medium models. - Comparison of streaming speed among different Mistol models. 13:05 🌐 *Conclusion and Future Plans* - Positive feedback on Mistol AI's performance and streaming functionality. - Expressing excitement about exploring other APIs and supporting Mistol's progress. - Curiosity about Mistol's Medium model and the potential for more benchmarks and information. Made with HARPA AI
🎯 Key Takeaways for quick navigation: 00:00 🤖 *Overview of Mistral AI models and pricing* 02:21 👕 *Comparing models on shirt drying word problem* - Mistral gets it right, ChatGPT gets it wrong 06:22 🏀 *Comparing models on ball in bag world problem* - GPT-4 reasons perfectly, Mistral models struggle 08:51 🐍 *Comparing models on coding snake game in Python* - GPT-4 codes full game, Mistral gives partial code 11:27 ⏩ *Demo of Mistral streaming responses * - All models stream paragraphs quickly 12:50 👍 *Overall positive, ready to explore more APIs* Made with HARPA AI
Mixtral is amazing. It's the first one I see that gets this question right: "In a totally classical family, a girl named Sally has 3 brothers Alfred, Bernard and Charlie, who each have 2 sisters. How many sisters does Sally have?" and it even explains it well. MistralAI is really trying to address the issues that plague other LLMs and that's great, it will end up with one we can finally trust a little bit more.
@@xbon1 That's the point. I have zero interest in using a remote system that can disappear or change whenever it wants, and that I have to pay for. Local AI LLMs are the future, for the simple reason that the workloads that will rely on them are generally not of the type you're willing to offload to a remote entity. I'm pretty sure that there are use cases for very large models such as GPT-4 that can solve more complex problems than the ones you can run locally. But 7B LLMs such as Mistral-7B can run fast on your smartphone or laptop right now. That's already better for your data than GPT-4 for a vast majority of use cases.
Yes, but you need a LOT of RAM. So only practical possibility is run quant versions (Bloke has them). I tried q4_k_m and q5_k_m but they performed pretty badly for me, maybe these composite models really need full precision to run well.
@@martinmakuch2556 I ran the same model on my hardware, it uses a lot of ram, 32GB of system ram and around 11GB of vram, performance is around 5 tokens which is around reading speed, so it's not lightning fast but it's fast enough to use, that's on a Ryzen 3700x and Radeon 6700, and honestly, I'm surprised how well this actually works on my system, being that 32GB is quite cheap nowadays. Also, a new update was just released for Faraday that offers Vulkan and is supposed to be a lot faster on AMD hardware, but I can't seem to get it to work, so performance might be better than 5 tokens on my hardware when it works.
I appreciate you showing actual pricing numbers, could we perhaps normalize showing specific numbers for things like this as well as for size requirements for local versions (when available) would be very helpful going forward.
How do I finetune SOTA models? They're cool, but they don't allow me to make the most of them. Finetuning would solve that, and I'd pay for that, but they don't have such an option, and setting up everything locally manually is too complicated. GPT4, biggest Mistral model - I want to be able to fine-tune them!
Sorry but for the ball in the bag with a hole problem, I'm going to have to give a fail to GPT-3 GPT-4 Mistral-small Mistral-medium GPT-All-About-AI All of you failed the test as none of the models understood all important aspects of the problem. Out of all models it seems Mistral models had the closest answers, but still missed the small probability that the ball rolled over the hold in the bottom of the bag, dropped out and the person did not notice. Mistral likely just assumed this was too stupid a scenario to consider.
NGL, around 6-11 months ago I really liked your content, and I thought a channel labeled "All about AI" would also cover any big updates in AI, critically testing and comparing models, and so on. Yet I miss any and all content on new stuff such as Grok, Bard, Gemini, and so on. I just no longer find the content interesting as it never covers the type of AI stuff I am interested in, which is weird considering the channel name. I personally am unsubscribing, but thought before doing so I should state why. best of luck with your channel though, nothing against yourself, you seem like a cool dude.
The answer from mixtral medium "ball is lost during transit" doesn't necessarily mean that the ball is lost in the box during shipping, it could be lost during transit between the moment you put the ball in the bag and the moment you put the bag in the box. imho, the model got it right, the human just didn't interpret the result correctly. And the gpt4 answer you labeled as 'perfect' could be wrong as well. Depending on how the bag was tilted, the ball wouldn't have fallen out of the bag. I feel like the mistral-medium answer was the most accurate one.
yeah in term of reasoning mistral is better. Even the small one said that it could be lost during transit or arrives at well
I don't think he used GPT4 at all? There's a huge difference to GPT3.5
🎯 Key Takeaways for quick navigation:
00:00 🚀 *Overview and Platform Introduction*
- Mistol AI API access for testing compared to GPT-3.5 and GPT-4.
- Introduction to Mistol AI platform features, including models, streaming options, and safe mode.
- Pricing overview and initial impressions of Mistol AI's competitiveness.
01:24 💰 *Pricing Comparison*
- Detailed pricing calculations for Mistol AI's medium and small models.
- Competitive pricing compared to GPT-3.5 Turbo.
- Ready to proceed with testing after pricing analysis.
02:21 🧠 *Testing Scenarios Introduction*
- Explanation of the three testing scenarios: Shirt problem, World model problem, and Python Snake game.
- Description of the reasoning and coding challenges posed to Mistol AI models.
04:21 🤖 *Testing GPT-3.5 on the Shirt Problem*
- Quick test of GPT-3.5 on the Shirt problem.
- GPT-3.5's incorrect response and analysis of the mistake.
- Setting the stage for Mistol AI's response to the same problem.
05:17 👕 *Testing Mistol Small Model on the Shirt Problem*
- Mistol Small Model's correct response to the Shirt problem.
- Highlighting Mistol's ability to understand parallel processing in the problem.
- Confidence in Mistol's capability based on the small model's performance.
06:08 🌍 *Testing Mistol Medium Model on the World Problem*
- Introduction to the World model problem.
- GPT-3.5's incorrect response to the World problem.
- Preparing to test both Mistol Small and Medium models on the same problem.
07:29 🌐 *Testing Mistol Small and Medium Models on the World Problem*
- Mistol Small Model's response and analysis.
- Mistol Medium Model's response and analysis.
- Comparison with GPT-4's accurate response to the World problem.
08:51 🐍 *Testing Python Snake Game - Mistol Small Model*
- Implementing and testing Python Snake Game code using Mistol Small Model.
- Evaluation of the generated code's quality.
- Comparison with Mistol's response to other models.
10:31 🎮 *Testing Python Snake Game - Mistol Medium Model*
- Implementing and testing Python Snake Game code using Mistol Medium Model.
- Evaluation of the generated code's quality and UI.
- Comparison with Mistol Small Model and GPT-4's responses.
11:42 🔄 *Testing Streaming Function - Mistol Tiny, Small, and Medium Models*
- Introduction to streaming functionality on Mistol AI.
- Quick streaming test on Mistol Tiny, Small, and Medium models.
- Comparison of streaming speed among different Mistol models.
13:05 🌐 *Conclusion and Future Plans*
- Positive feedback on Mistol AI's performance and streaming functionality.
- Expressing excitement about exploring other APIs and supporting Mistol's progress.
- Curiosity about Mistol's Medium model and the potential for more benchmarks and information.
Made with HARPA AI
🎯 Key Takeaways for quick navigation:
00:00 🤖 *Overview of Mistral AI models and pricing*
02:21 👕 *Comparing models on shirt drying word problem*
- Mistral gets it right, ChatGPT gets it wrong
06:22 🏀 *Comparing models on ball in bag world problem*
- GPT-4 reasons perfectly, Mistral models struggle
08:51 🐍 *Comparing models on coding snake game in Python*
- GPT-4 codes full game, Mistral gives partial code
11:27 ⏩ *Demo of Mistral streaming responses *
- All models stream paragraphs quickly
12:50 👍 *Overall positive, ready to explore more APIs*
Made with HARPA AI
Mixtral is amazing. It's the first one I see that gets this question right: "In a totally classical family, a girl named Sally has 3 brothers Alfred, Bernard and Charlie, who each have 2 sisters. How many sisters does Sally have?" and it even explains it well. MistralAI is really trying to address the issues that plague other LLMs and that's great, it will end up with one we can finally trust a little bit more.
um... what? GPT-4 can do this fine. unless you mean for local AI, then yeah this is by far and large the best local AI.
@@xbon1 That's the point. I have zero interest in using a remote system that can disappear or change whenever it wants, and that I have to pay for. Local AI LLMs are the future, for the simple reason that the workloads that will rely on them are generally not of the type you're willing to offload to a remote entity. I'm pretty sure that there are use cases for very large models such as GPT-4 that can solve more complex problems than the ones you can run locally. But 7B LLMs such as Mistral-7B can run fast on your smartphone or laptop right now. That's already better for your data than GPT-4 for a vast majority of use cases.
Thank you for testing it - at the moment I use GPT API - but maybe in 2024 I will try some OpenSource models.
Cool, yeah give it a go :)
Eagerly anticipating Mistral’s debut in our upcoming Taskade Multi-Agent update! 🌈
So it doesn't have a friendly interface like Open AI's playground?
By the way, the weights of Mixtral 8x7b are released, so you can run it locally with enough ram/vram.
Yes, but you need a LOT of RAM. So only practical possibility is run quant versions (Bloke has them). I tried q4_k_m and q5_k_m but they performed pretty badly for me, maybe these composite models really need full precision to run well.
@@martinmakuch2556 I ran the same model on my hardware, it uses a lot of ram, 32GB of system ram and around 11GB of vram, performance is around 5 tokens which is around reading speed, so it's not lightning fast but it's fast enough to use, that's on a Ryzen 3700x and Radeon 6700, and honestly, I'm surprised how well this actually works on my system, being that 32GB is quite cheap nowadays.
Also, a new update was just released for Faraday that offers Vulkan and is supposed to be a lot faster on AMD hardware, but I can't seem to get it to work, so performance might be better than 5 tokens on my hardware when it works.
Any AI Assistant functionality with tools (RAG, code interpreter and function calling type) you came across from any of the models out there?
I appreciate you showing actual pricing numbers, could we perhaps normalize showing specific numbers for things like this as well as for size requirements for local versions (when available) would be very helpful going forward.
Great overview. Thanks for showing the code tests.
Thanks a bunch for showing comparisons of the different models and how they preform...
Does it require a lot of resources to run it locally?
Appreciate this thank you
Mistral has my sympathy bonus. I am able to run this offline as well.
Do these LLMs pass ai detection?
Yeah most do anyway now :)
How do I finetune SOTA models? They're cool, but they don't allow me to make the most of them. Finetuning would solve that, and I'd pay for that, but they don't have such an option, and setting up everything locally manually is too complicated. GPT4, biggest Mistral model - I want to be able to fine-tune them!
What is the reason programs like this and stable diffusion are not ‘plug and play’ with more simple installation?
It is simple
is mixtral running locally, is it using the internet at all?
Yes.. Both are available. Locally or over internet.
thanks
do i need to pay for mistral?
how many tokens is 3k words ?
blog writer ? I'm also
Just me, patiently waiting until a model this good is uncensored.
Will be weeks or less, since they released the base model
It's true. I think most people are the same.
quite frankly it's not much censored. You can already make it tell you jokes that cannot be repeated in public 🙂
fyi The uncensored dolphin-mixtral version is already available.
@@carstenli absolutely, but it seems slightly less good than mixtral alone (while in the past dolphin used to improve on top of mistral).
Sorry but for the ball in the bag with a hole problem, I'm going to have to give a fail to
GPT-3
GPT-4
Mistral-small
Mistral-medium
GPT-All-About-AI
All of you failed the test as none of the models understood all important aspects of the problem. Out of all models it seems Mistral models had the closest answers, but still missed the small probability that the ball rolled over the hold in the bottom of the bag, dropped out and the person did not notice. Mistral likely just assumed this was too stupid a scenario to consider.
NGL, around 6-11 months ago I really liked your content, and I thought a channel labeled "All about AI" would also cover any big updates in AI, critically testing and comparing models, and so on. Yet I miss any and all content on new stuff such as Grok, Bard, Gemini, and so on. I just no longer find the content interesting as it never covers the type of AI stuff I am interested in, which is weird considering the channel name. I personally am unsubscribing, but thought before doing so I should state why. best of luck with your channel though, nothing against yourself, you seem like a cool dude.
Why do all this manually ? Use gpt to write a test harness for thus stuff and posh it all go an excel or csv for quick review side by side. Easy.
Cause he wants to make "follow along with my tests and thoughts" content from it, its hard to drag out on something as efficient as ur example