Great video! It would be even more helpful if you could add a brief summary at the end where you share your thoughts on the model. Specifically, it would be valuable to hear your opinion on how well the model performs, its strengths and weaknesses, and the scenarios where it excels. Additionally, insights on how it compares to other models of similar size and what it might be equivalent to (e.g., “Llama 3.3 is comparable to Llama 405”) would make it much easier for us to categorize and understand its potential. Hearing this information from an experienced professional like you saves us a lot of time, and while I know this is subjective, it’s incredibly useful. If possible, giving the model a score based on different categories would make your review even more insightful. Thank you for your great work and dedication!
I very much appreciate the dedication and I personally like the in-depth reviews. For starter I appreciate that you take number of parameters and quantization into account. Not everyone does, so thanks for bringing to people's attention that these variables are essential when we try to evaluate models. On the flip side: in this case, the questions are too vague and abstract for me to find any real business value in video as a whole. The point being: after watching this, I still don't have a grasp of how these models compare to each other... whereas I think it there could have been quite a massive business value in a video like this, if simply question's with one objective and verifiable answer were asked. For example: you could have done this by looking at a handful of industries or use-cases and then compare question relevant to that industry. A simple and obvious example could be the industry coding where several coding problems are asked and tested in the end (given the temperature, you might take 3 tries for each problem and then plot the results. Another industry might be writing where you for example put a one or many wrong words in a paragraph and then ask the models to find the wrong words, make that increasingly harder, run that 3 times each and again plot the number of correct answers in a graph. With some brainstorming (or 'prompting' as we call that today) better practical use-cases could be found than what I can come up with while just typing this. In the end there then would have probably a very clear picture of when it matters to have a higher parameter or quantization count and when it doesn't. In summary: making it practical, by starting from real-world problems, taking into account temperature and the fact that these models are about probability, then asking objective and verifiable questions and then plotting the results. Would have been massive business value for me: for our company that's what I recently did to select a proper model (slm) and this video could have been the one saving me hours. Then again: than you very much for you diligence and professionalism. I very much enjoy your content and I'm happy to see people find their way to your channel. Cheers, Andy
It was great to see the responses side by side. One can see the difference immediately. What was interesting is that smaller model generated verbose answers.
That was quite interesting. I watched all the way so I liked the longer format. I'm glad you channel is doing well. It is super cool to be able to run this stuff on local computers. I'm upgrading right now but nothing like that monster you have to play with. The question choice was interesting. I wonder if the robot dreamed of electric sheep. It was weird the way they failed to account for the extra killer in the room. Seemingly so obvious.
We are bump stick users compared to his hardware, which solves the LED problem: People use the LED light of their high end LLM systems to replace the traditional light bulbs, that's why they use more electicity after replacing traditional with LED...
Hi Matt, thanks for video. It is not first time someone using LLM to "solve" logical/mathematical problems, etc. I personally do not think this is correct use of LLM. I mainly see LLM as great tool to help "convert" text to some other/condensed text, extract information and convert it into more machine friendly format, categorize large text in mass volume, etc. Simply put it what you can do yourself but machine can do it much faster. However you can always make steps manually and check result correctness. As you think logical reasoning is feature of LLM to use (extract info a "logically" find relations, etc.) where and how you would use it in real production live system with confidence (I mean use returned result in farther downstream process for every returned result)? I personally do not see it can provide sufficiently confidence results there.
Ahh good idea. I am sure there is a bunch of classic Korean literature online and then have it and google translate translate to English and then find a Korean native tell me which is the better translation.
@@technovangelist there's ought to be some books already in both korean and english. My main idea was simpler though, just try to check if the english he translated from a known korean text made sense :P
Good production. You're starting to relax more and be yourself in front of the camera, and the editing is good.
This was 100% scripted as usual and I read in front of a teleprompter.
I like that you are having the models write short stories. AI creativity is an important factor to me.
Wow - quite the machine you've got there. Congrats!
Great video! It would be even more helpful if you could add a brief summary at the end where you share your thoughts on the model. Specifically, it would be valuable to hear your opinion on how well the model performs, its strengths and weaknesses, and the scenarios where it excels. Additionally, insights on how it compares to other models of similar size and what it might be equivalent to (e.g., “Llama 3.3 is comparable to Llama 405”) would make it much easier for us to categorize and understand its potential.
Hearing this information from an experienced professional like you saves us a lot of time, and while I know this is subjective, it’s incredibly useful. If possible, giving the model a score based on different categories would make your review even more insightful.
Thank you for your great work and dedication!
When I watched again after posting I realized I hadn’t done that. It’s a great point and it was something I meant to do. Thanks for pointing it out.
I very much appreciate the dedication and I personally like the in-depth reviews. For starter I appreciate that you take number of parameters and quantization into account. Not everyone does, so thanks for bringing to people's attention that these variables are essential when we try to evaluate models.
On the flip side: in this case, the questions are too vague and abstract for me to find any real business value in video as a whole.
The point being: after watching this, I still don't have a grasp of how these models compare to each other... whereas I think it there could have been quite a massive business value in a video like this, if simply question's with one objective and verifiable answer were asked.
For example: you could have done this by looking at a handful of industries or use-cases and then compare question relevant to that industry.
A simple and obvious example could be the industry coding where several coding problems are asked and tested in the end (given the temperature, you might take 3 tries for each problem and then plot the results.
Another industry might be writing where you for example put a one or many wrong words in a paragraph and then ask the models to find the wrong words, make that increasingly harder, run that 3 times each and again plot the number of correct answers in a graph.
With some brainstorming (or 'prompting' as we call that today) better practical use-cases could be found than what I can come up with while just typing this.
In the end there then would have probably a very clear picture of when it matters to have a higher parameter or quantization count and when it doesn't.
In summary: making it practical, by starting from real-world problems, taking into account temperature and the fact that these models are about probability, then asking objective and verifiable questions and then plotting the results. Would have been massive business value for me: for our company that's what I recently did to select a proper model (slm) and this video could have been the one saving me hours.
Then again: than you very much for you diligence and professionalism. I very much enjoy your content and I'm happy to see people find their way to your channel.
Cheers,
Andy
It was great to see the responses side by side. One can see the difference immediately. What was interesting is that smaller model generated verbose answers.
That was quite interesting. I watched all the way so I liked the longer format. I'm glad you channel is doing well. It is super cool to be able to run this stuff on local computers. I'm upgrading right now but nothing like that monster you have to play with.
The question choice was interesting.
I wonder if the robot dreamed of electric sheep.
It was weird the way they failed to account for the extra killer in the room. Seemingly so obvious.
We are bump stick users compared to his hardware, which solves the LED problem:
People use the LED light of their high end LLM systems to replace the traditional light bulbs, that's why they use more electicity after replacing traditional with LED...
What's the most common winning🏆 investment strategy for a new beginner?
Hi Matt, thanks for video. It is not first time someone using LLM to "solve" logical/mathematical problems, etc. I personally do not think this is correct use of LLM. I mainly see LLM as great tool to help "convert" text to some other/condensed text, extract information and convert it into more machine friendly format, categorize large text in mass volume, etc. Simply put it what you can do yourself but machine can do it much faster. However you can always make steps manually and check result correctness. As you think logical reasoning is feature of LLM to use (extract info a "logically" find relations, etc.) where and how you would use it in real production live system with confidence (I mean use returned result in farther downstream process for every returned result)? I personally do not see it can provide sufficiently confidence results there.
What's the most common winning🏆 investment strategy for a new beginner
That gpu flex!
I’m just looking for a model that’s good at creative writing. Fantasy novels and D&D basically
17:20 you might have started it backwards, give him a korean sentence which you knew the meaning and ask him to translate it for english
Ahh good idea. I am sure there is a bunch of classic Korean literature online and then have it and google translate translate to English and then find a Korean native tell me which is the better translation.
@@technovangelist there's ought to be some books already in both korean and english. My main idea was simpler though, just try to check if the english he translated from a known korean text made sense :P
Very bad prompting sir.
Woo, which bank did you rob for that machine..? 👀
Did he just say LG?
Yup. That’s who made it.