Great video! It would be even more helpful if you could add a brief summary at the end where you share your thoughts on the model. Specifically, it would be valuable to hear your opinion on how well the model performs, its strengths and weaknesses, and the scenarios where it excels. Additionally, insights on how it compares to other models of similar size and what it might be equivalent to (e.g., “Llama 3.3 is comparable to Llama 405”) would make it much easier for us to categorize and understand its potential. Hearing this information from an experienced professional like you saves us a lot of time, and while I know this is subjective, it’s incredibly useful. If possible, giving the model a score based on different categories would make your review even more insightful. Thank you for your great work and dedication!
I very much appreciate the dedication and I personally like the in-depth reviews. For starter I appreciate that you take number of parameters and quantization into account. Not everyone does, so thanks for bringing to people's attention that these variables are essential when we try to evaluate models. On the flip side: in this case, the questions are too vague and abstract for me to find any real business value in video as a whole. The point being: after watching this, I still don't have a grasp of how these models compare to each other... whereas I think it there could have been quite a massive business value in a video like this, if simply question's with one objective and verifiable answer were asked. For example: you could have done this by looking at a handful of industries or use-cases and then compare question relevant to that industry. A simple and obvious example could be the industry coding where several coding problems are asked and tested in the end (given the temperature, you might take 3 tries for each problem and then plot the results. Another industry might be writing where you for example put a one or many wrong words in a paragraph and then ask the models to find the wrong words, make that increasingly harder, run that 3 times each and again plot the number of correct answers in a graph. With some brainstorming (or 'prompting' as we call that today) better practical use-cases could be found than what I can come up with while just typing this. In the end there then would have probably a very clear picture of when it matters to have a higher parameter or quantization count and when it doesn't. In summary: making it practical, by starting from real-world problems, taking into account temperature and the fact that these models are about probability, then asking objective and verifiable questions and then plotting the results. Would have been massive business value for me: for our company that's what I recently did to select a proper model (slm) and this video could have been the one saving me hours. Then again: than you very much for you diligence and professionalism. I very much enjoy your content and I'm happy to see people find their way to your channel. Cheers, Andy
That was quite interesting. I watched all the way so I liked the longer format. I'm glad you channel is doing well. It is super cool to be able to run this stuff on local computers. I'm upgrading right now but nothing like that monster you have to play with. The question choice was interesting. I wonder if the robot dreamed of electric sheep. It was weird the way they failed to account for the extra killer in the room. Seemingly so obvious.
We are bump stick users compared to his hardware, which solves the LED problem: People use the LED light of their high end LLM systems to replace the traditional light bulbs, that's why they use more electicity after replacing traditional with LED...
It was great to see the responses side by side. One can see the difference immediately. What was interesting is that smaller model generated verbose answers.
I find your videos are well produced and formatted to cater to a wide audience. Specifically I have found your videos great to come back to as a reference, as I'm working with lower level Ollama features infrequently. Specifically to this video and when performing these tests, I think it's interesting to determine repeatability and ensuring that the data being outputted can be correlated between models. In a situation like this where you are comparing different versions of the same model would using a statically set seed assist in this? My assumption would be you would be able to compare the output from the same start point with each model.
If anything, I should have shown more repeating, so you could see how the answers change and you get that when you don't set the seed. Setting the seed makes it less real world. I would imagine setting the seed is only useful when you are doing some sort of automated testing.
Hi Matt, thanks for video. It is not first time someone using LLM to "solve" logical/mathematical problems, etc. I personally do not think this is correct use of LLM. I mainly see LLM as great tool to help "convert" text to some other/condensed text, extract information and convert it into more machine friendly format, categorize large text in mass volume, etc. Simply put it what you can do yourself but machine can do it much faster. However you can always make steps manually and check result correctness. As you think logical reasoning is feature of LLM to use (extract info a "logically" find relations, etc.) where and how you would use it in real production live system with confidence (I mean use returned result in farther downstream process for every returned result)? I personally do not see it can provide sufficiently confidence results there.
Hello Matt. What do you think about vRAM requirement? I have a 8Gb vRAM: stable-code:3b-code-fp16 5.6 GB - not enough vRAM starcoder2:3b-fp16 6.1 GB - work well, 8 GB enough. falcon3:3b-instruct-fp16 6.5 GB - work well, 8 GB enough. codestral:22b-v0.1-q2_K 8.3 GB - work well, 8 GB enough. How do I know how much vRAM it needs before downloading? Also I think fp16 models are for coding and research papers, and q2_ for creativity and art. Thank you.
Ahh good idea. I am sure there is a bunch of classic Korean literature online and then have it and google translate translate to English and then find a Korean native tell me which is the better translation.
@@technovangelist there's ought to be some books already in both korean and english. My main idea was simpler though, just try to check if the english he translated from a known korean text made sense :P
I like that you are having the models write short stories. AI creativity is an important factor to me.
Good production. You're starting to relax more and be yourself in front of the camera, and the editing is good.
This was 100% scripted as usual and I read in front of a teleprompter.
Great video! It would be even more helpful if you could add a brief summary at the end where you share your thoughts on the model. Specifically, it would be valuable to hear your opinion on how well the model performs, its strengths and weaknesses, and the scenarios where it excels. Additionally, insights on how it compares to other models of similar size and what it might be equivalent to (e.g., “Llama 3.3 is comparable to Llama 405”) would make it much easier for us to categorize and understand its potential.
Hearing this information from an experienced professional like you saves us a lot of time, and while I know this is subjective, it’s incredibly useful. If possible, giving the model a score based on different categories would make your review even more insightful.
Thank you for your great work and dedication!
When I watched again after posting I realized I hadn’t done that. It’s a great point and it was something I meant to do. Thanks for pointing it out.
Wow - quite the machine you've got there. Congrats!
I very much appreciate the dedication and I personally like the in-depth reviews. For starter I appreciate that you take number of parameters and quantization into account. Not everyone does, so thanks for bringing to people's attention that these variables are essential when we try to evaluate models.
On the flip side: in this case, the questions are too vague and abstract for me to find any real business value in video as a whole.
The point being: after watching this, I still don't have a grasp of how these models compare to each other... whereas I think it there could have been quite a massive business value in a video like this, if simply question's with one objective and verifiable answer were asked.
For example: you could have done this by looking at a handful of industries or use-cases and then compare question relevant to that industry.
A simple and obvious example could be the industry coding where several coding problems are asked and tested in the end (given the temperature, you might take 3 tries for each problem and then plot the results.
Another industry might be writing where you for example put a one or many wrong words in a paragraph and then ask the models to find the wrong words, make that increasingly harder, run that 3 times each and again plot the number of correct answers in a graph.
With some brainstorming (or 'prompting' as we call that today) better practical use-cases could be found than what I can come up with while just typing this.
In the end there then would have probably a very clear picture of when it matters to have a higher parameter or quantization count and when it doesn't.
In summary: making it practical, by starting from real-world problems, taking into account temperature and the fact that these models are about probability, then asking objective and verifiable questions and then plotting the results. Would have been massive business value for me: for our company that's what I recently did to select a proper model (slm) and this video could have been the one saving me hours.
Then again: than you very much for you diligence and professionalism. I very much enjoy your content and I'm happy to see people find their way to your channel.
Cheers,
Andy
That was quite interesting. I watched all the way so I liked the longer format. I'm glad you channel is doing well. It is super cool to be able to run this stuff on local computers. I'm upgrading right now but nothing like that monster you have to play with.
The question choice was interesting.
I wonder if the robot dreamed of electric sheep.
It was weird the way they failed to account for the extra killer in the room. Seemingly so obvious.
What's the most common winning🏆 investment strategy for a new beginner
We are bump stick users compared to his hardware, which solves the LED problem:
People use the LED light of their high end LLM systems to replace the traditional light bulbs, that's why they use more electicity after replacing traditional with LED...
It was great to see the responses side by side. One can see the difference immediately. What was interesting is that smaller model generated verbose answers.
I find your videos are well produced and formatted to cater to a wide audience. Specifically I have found your videos great to come back to as a reference, as I'm working with lower level Ollama features infrequently. Specifically to this video and when performing these tests, I think it's interesting to determine repeatability and ensuring that the data being outputted can be correlated between models. In a situation like this where you are comparing different versions of the same model would using a statically set seed assist in this? My assumption would be you would be able to compare the output from the same start point with each model.
If anything, I should have shown more repeating, so you could see how the answers change and you get that when you don't set the seed. Setting the seed makes it less real world. I would imagine setting the seed is only useful when you are doing some sort of automated testing.
What's the most common winning🏆 investment strategy for a new beginner?
Hi Matt, thanks for video. It is not first time someone using LLM to "solve" logical/mathematical problems, etc. I personally do not think this is correct use of LLM. I mainly see LLM as great tool to help "convert" text to some other/condensed text, extract information and convert it into more machine friendly format, categorize large text in mass volume, etc. Simply put it what you can do yourself but machine can do it much faster. However you can always make steps manually and check result correctness. As you think logical reasoning is feature of LLM to use (extract info a "logically" find relations, etc.) where and how you would use it in real production live system with confidence (I mean use returned result in farther downstream process for every returned result)? I personally do not see it can provide sufficiently confidence results there.
Hello Matt.
What do you think about vRAM requirement?
I have a 8Gb vRAM:
stable-code:3b-code-fp16 5.6 GB - not enough vRAM
starcoder2:3b-fp16 6.1 GB - work well, 8 GB enough.
falcon3:3b-instruct-fp16 6.5 GB - work well, 8 GB enough.
codestral:22b-v0.1-q2_K 8.3 GB - work well, 8 GB enough.
How do I know how much vRAM it needs before downloading?
Also I think fp16 models are for coding and research papers, and q2_ for creativity and art.
Thank you.
i think in most cases stick with q4. it’s the sweet spot and most folks just cant see a diff between that and fp16
That gpu flex!
I’m just looking for a model that’s good at creative writing. Fantasy novels and D&D basically
17:20 you might have started it backwards, give him a korean sentence which you knew the meaning and ask him to translate it for english
Ahh good idea. I am sure there is a bunch of classic Korean literature online and then have it and google translate translate to English and then find a Korean native tell me which is the better translation.
@@technovangelist there's ought to be some books already in both korean and english. My main idea was simpler though, just try to check if the english he translated from a known korean text made sense :P
Woo, which bank did you rob for that machine..? 👀
Did he just say LG?
Yup. That’s who made it.
Very bad prompting sir.