- Видео 99
- Просмотров 23 374
Volko Volko
Франция
Добавлен 7 дек 2018
Welcome to this open minded and tech guided channel, where I will share with you my projects and give you some tips and tricks ;-)
o1 is dead, long life to o3 ! AGI is CONFIRMED by ARC PRIZE!
Open ai did they 12nd (and last one) announcement.
They released o3, which is a huuuge performance gain.
The performance jump is higher going from o1 to o3 than from GPT4o to o1 !!!
You really have to see the video to fully understand
They released o3, which is a huuuge performance gain.
The performance jump is higher going from o1 to o3 than from GPT4o to o1 !!!
You really have to see the video to fully understand
Просмотров: 44
Видео
Gemini 2.0 Flash became ... o1 ? Gemini 2.0 Flash Thinking 🤯
Просмотров 4694 часа назад
In this video, we are going to take a look at the new Gemini 2.0 Flash Thinking that just released
This new feature of Gemini 2.0 is ... SCARY
Просмотров 3789 часов назад
In this video, we are going to take a look at Multimodal Live API with Gemini 2.0 available in aistudio.google.com for free. He is able to hear you, talk to you and see your entire SCREEN !
Gemini 2.0 vs GPT4o mini vs Claude 3.5 Haiku ! Gemini did SOOO BAD 😨
Просмотров 97421 час назад
Gemini is reaaally not that great in real use cases. I don't really see why it is ranked so high in the benchmarks
Let's test QwQ, the new opensource alternative to o1
Просмотров 1,3 тыс.День назад
In this video, I'm going to test the performances of QwQ, a new preview model released by Qwen that works the same way as o1 from OpenAI
Llama 3.3 vs Llama 3.2 ! HUGE IMPROVEMENTS !
Просмотров 1,7 тыс.14 дней назад
Let's compare these two models !
Quantization is Simple! Here is how it works
Просмотров 39121 день назад
In this video, I'm going to show you how quantization is working under the hoods
BAD vs GOOD prompting
Просмотров 775Месяц назад
Let's see in this video if we still need to make good prompting nowadays and if there is a difference, at what point is it different. Feel free to leave comments, they be negative if they are constructives.
Is Bigger Better ?
Просмотров 277Месяц назад
In this video, we are going to test the performances of each model size and compare it against other sizes
Qwen2.5 Coder 32B vs GPT4o vs Claude 3.5 Sonnet (new)
Просмотров 5 тыс.Месяц назад
Let's see which model is the best
Ultimate Guide: Easily Quantize Your LLM in Any Format
Просмотров 156Месяц назад
Today, I'm going to show you how I quantize all my models : Link : colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4
New Chinese LLM beat Llama3.2 !!
Просмотров 128Месяц назад
Let's test a new chinese model and see how it perform against LLama 3.2 and proprieteray models (new Claude 3.5 sonnet, GPT4o)
Avoir ChatGPT sans Connexion | FR
Просмотров 182Месяц назад
Bienvenue, Aujourd'hui, je vais vous apprendre à installer une IA locale similaire à ChatGPT.
[Tuto] [FR] | Comment installer et créer une Machine Virtuelle avec VirtualBox (2024)
Просмотров 317Месяц назад
Aujourd'hui, je vais vous montrer comment créer vous même des machines virtuels sur votre ordinateur afin que vous ayez un environnement sécurisé et propre
How to repair iPhone that Keeps Restarting
Просмотров 202 месяца назад
How to repair iPhone that Keeps Restarting
This is how to solve the Folder Icon on mac
Просмотров 312 месяца назад
This is how to solve the Folder Icon on mac
How to access o1 (Strawberry) API & chat without tier 5
Просмотров 3473 месяца назад
How to access o1 (Strawberry) API & chat without tier 5
OpenAI released GPT4o Mini | Let's test it !
Просмотров 1345 месяцев назад
OpenAI released GPT4o Mini | Let's test it !
MathΣtral First Test ! Quite impressive results ! Mistral AI
Просмотров 745 месяцев назад
MathΣtral First Test ! Quite impressive results ! Mistral AI
Make a DIY Onewheel - $200! Episode 2
Просмотров 245 месяцев назад
Make a DIY Onewheel - $200! Episode 2
I Recreated The Finals, but it's Open Source !
Просмотров 2705 месяцев назад
I Recreated The Finals, but it's Open Source !
Gemma2:27 Ollama Correction ! Now Incredible !
Просмотров 3195 месяцев назад
Gemma2:27 Ollama Correction ! Now Incredible !
Gemma2:27B First Test ! How Can it be THAT Bad ?!
Просмотров 3155 месяцев назад
Gemma2:27B First Test ! How Can it be THAT Bad ?!
Gemma2 First Test ! Incredible Results for a 9B model
Просмотров 1525 месяцев назад
Gemma2 First Test ! Incredible Results for a 9B model
UNDERVOLT, make your pc COOL AGAIN !
Просмотров 185 месяцев назад
UNDERVOLT, make your pc COOL AGAIN !
hi good vd just wanted ask if after exchanging messages with gemini it give a something went wrong error and you need to make a new chat
Hi, I'm sorry but no, I did not encounter a similar issue. Maybe reach the Google support to be helped 🤷 Thanks a lot for this kind comment ^^
You can type in, "provide the whole file" and you can also use "no yapping" then it will only write what you need, just code
Great advice!
merci car je suis parano d installe des trucs surtout apres qu un virus c est multiplie 800k fois sur mon ancien pc
De rieng ^^ Profite bien de ta Machine Virtuelle
why 3.5 haiku? sonnet has more scrores
Its the smaller model for fair comparison. You wouldn't compare it with o1
Yes, it is true. But the released was for the Gemini 2.0 flash. I needed to compare it against equal size/price models as @nashh600 correctly pointed out.
Interesting; what persona did you select?
We need to select a persona for Gemini ?
@ I don’t know Gemini, but the others, I would strongly recommend it to get a better result.
btw QwQ can totally do multi-turn. Set it to 32k context and 16k output tokens so its thinking isn't cut before he's done. llama.cpp has much more settings.
Oh okay, I didn't knew that. I thought it cannot do multi turn because it's single turn only in the QwQ Space ^^ Thanks a lot for the precision !
Tetris game is often my coding test and they all struggle with it.
Yes, tetris is quite difficult for LLMs. Only Claude 3.5 Sonnet and Qwen2.5 Coder 32B got it right on my tests. Even gpt4o didn't got it in my test (but i think it has more related to luck)
hey! Would it work with a 3060ti and 32gb ram?
I mean, you can't fit the required 24 gb of VRAM on your graphics card, but hey, only one way to find out if it works right.
@@hatnis well, it was free to ask 😅
Yes, but you will have to offload a lot in your CPU/RAM. It will run pretty slow but it will work 👍
In the video, I ran it in my 24Go of VRAM. I think it is q4_k_m
I was able to get it working on my new Mac mini base m4 pro chip model. QwQ-32B-Preview-GGUF bartowski repo. IQ3_XS quantization. the only one I could download as this one is 13.71 gb of ram. Note because I am using a Mac mini apples ram is unified so my 24gb of ram is shared between the gpu and cpu. if I spent spent a extra 300$ from the 1.4k I spent for the m4 pro model I could of loaded the max quantization model but I don't really do AI locally as I use online Ai services more. I hope this helps!
Was really fun to see how 3.3 made Tetris, worked amazingly. Was sad to see how 3.1 failed badly at making Tetris though...
Yeah, I wasn't expecting 3.1 to be THAT bad neither 😅
Great video, liked and subscribed.
Ohhh that's awesome man !!! Thanks a lot ^^
I liked and subscribed you vidéos too 😉
no difference except for hair style
🤣🤣🤣🤣
Great and really quick content man! Keep up the good work
Thanks a lot man 🙏👍👍
What is your build specs to run 3.3 w/ this speed ?
I'm using Groq, they have specialized LPUs (specialized hardware for LLM inference) that allows insane speeds and it is free to use (and you even have an api). There is also sambanova that offers the same thing
Where are you from?
France and you ?
how much vram required?
24go at the very least (with insane quantization like q2-q3). You absolutly need at least 1 3090/4090. I would recommand running it in your RAM with CPU and have at least 64Go of RAM and run it as q4-q5
unquantized it is about 130go.
But you can use it for free on Groq and SambaNova
@@volkovolko I'm going to run it with 8GB vram and 32GB ddr4 ram as usual. Q2 is around 26 Gigs so it shouldn't go to swap
I'd highly doubt you'd be able to run it with that. I have a RTX 3070 and 48GB of DDR4 RAM, and I tried running Llama 3.2 11b vision and it doesn't even load.
Not a great test to me because these models have been trained with these games before and the codes are in there. Let's try something custom and let's see how it can reason, create and solve problems. That will make it a good model. Also Claud 3.5 sonet is the best coder and very hard to make mistakes when coding.
I would be happy to test with any prompt you give to me ^^
btw your thumbnail isnt readable, you should make it bigger
Hummm, I don't really understand 😅😅
en Français c'est possible?
Oui en soit c'est possible, c'est simplement que la majorité de mon audience est anglaise
@@volkovolko mouais.., non mais c'est super intéressant, ce qui serait cool aussi ça serait de comparer les different types: GGUF, AWQ, GPTQ, bon après on comprend que c'est toujours une manière de rationaliser les virgules flottantes (et donc la quantité de bits et donc la taille du modèle), mais je pense que les gros progrès de l'IA maintenant ça va être sur la quantization, réussir à condenser au maximum les modèles pour les rendre, certes un peu moins précis mais beaucoup plus pertinent (utilisables sans avoir des racks de GPU). Enfin, en tout cas merci.
Je partage tout à fait cet avis
Theres a little issue with the video cutting and such, but I did get a lot of information from it!
Yes, I had some issues with Clipchamp, which I use to edit the video. But I'm glad it helped you !!
commen ton installe le iso stp
Ici : www.microsoft.com/fr-fr/software-download/windows11 ^^
what are the factors that make a good prompt. would be great if u reply
Quite verbose, explain exactly what features you want, well ordered and formated, use prompting techniques like Chain of Thought etc. In practice, I first ask chatgpt or any other LLM to create a good prompt for me then, I regive it to him. I gives way better results
Now make it create a Minecraft JNI client with a basic autosprint module. Don’t think it can do that yet.
I don't even know what it is 😅😂😂
Knowing how to make good prompts is a useful skill nowadays
Yes it is !!
Thank you for this demonstration. In the future, please work on more complex apps. I’m happy you tried Tetris instead of only the snake game.
They the issue is that we need to balance the complexity of the tasks. If it's too easy all models get it right so we cannot compare them If it's too difficult all models fails so we cannot compare them. Tetris and Pac man games seems currently a good fit for SOTA and aren't that tested so that's why I use them
Même si les LLM deviendraient extrêmement puissant, et pourrait parfaitement répondre à un prompt, le prompt engineering resterait important. L'une des choses les plus importantes, c'est de faire comprendre notre intention au LLM, et c'est pas forcément évident, notamment si y a pas suffisamment de contexte, ou alors qu'il y a du contexte qui parasite la réponse. L'exemple que j'ai en tête, c'est quand on lui demande d'écrire un mail de manière formelle, et qu'il utilise plein de mots compliqués qui rende la réponse peu naturelle.
Oui, cependant il y a souvent des options "par défaut". Si je lui demande de me créer un programme en python, il va systématiquement me faire un hello world. Donc certains détails "par défaut" on pourrait penser que tu n'as pas besoin de les donner (comme ce serait le cas pour un humain) mais en réalité ca améliore la qualité de la réponse. Cependant, je pense qu'à l'avenir on aura pas besoin de préciser ces chses "Par défauts"
This really does show a lot. I think the issue was that the FPS was 60 in the second one, which caused the ghosts to move real fast. Overall, i think this showed that good prompting will always give better results depending on the question.
Yeah, before this test I thought that nowadays we don't need good prompting anymore. Especially due to the system message prompting of ChatGPT, the fact that it ask the model to reformulated the question before anwsering However it turns out that I was completely wrong
You must good and spesific prompting to generate pac man maybe
Yes, you're right. Providing a better, longer and more precise prompt would have surely improve the quality of the LLM answer. However, the video would have been longer and I think in real life, most people are lazy and don't provide to the LLM a well done prompt. Maybe I will make a video comparing how big is the performance hit between a good and a bad prompt
I understand all models are quantized at 4 bits in this video? Can you do another video with 8bit quantization for the 3b, 7b and 14bi models? Maybe a higher quant may improve their quality. It can be very useful info!
Yes, they are 4bits as recommanded by Ollama. However, I can make a new video trying with 8bits as you asked. But I think the results will be pretty much the same. Maybe I should rather make a video comparing each quant of the 7b
@volkovolko yes in my experiments with llms so far it looks like for small models (<8b params) quantization does have an impact on model output quality while for larger models the impact is less noticeable.
Thanks for the comparison but this was painful to watch. Please cut the parts that are not relevant to the subject or at least add timestamps
I'm trying to do my best. When I made this video. I didn't had any speakers so I couldn't test the audio nor make great cuts
Upgrading MacBook RAM Speedrun (WORLD RECORD)
😂😂😂
If you do a real software project, you'll find out claude sonnet new is the best, and gpt4 is very good at organizing.
I do real software projects as I'm a developer. While Claude and GPT4o are still better for big projects, qwen is a good alternative for just little prompting to avoid going to stack overflow for quick and simple questions
amazing, thanks for the test
Glad you liked it!
Now this is my kind of stuff. Personally, 0.5b is incapable of any sort of coding. In my experience, I've seen it get really bad results compared to 3b, 7b, etc. as it's made for small coding tasks and fill-in for code. Thanks for showing this comparison though! I also recommend benchmarking their processes at making Discord bots, as that would show a variety of new things. 3b is capable of making a lot more than GPT-4, as GPT forgets variables, portions of code, etc. when Qwen is capable of creating Discord bots which run with no error on start. Love your videos, don't ever stop making them ❤
For you a 3B Qwen can be better for discord bot coding than GPT4o ? (not a judgment, I just want to know)
@volkovolko Yeah, it does a lot better at importing packages, using Python properly, and even does well at managing errors gracefully.
Nice video but i think Claude is still better. If i compare these models at first i always say to myself "If these models are slightly close to each other (In terms of technical specifications) it is okay to compare but if it's not what is the point? Like i understand comparing between open source models like Qwen and Llama or closed source models like Gpt4-o and Claude 3.5 Sonnet
Yes, the results of the tests I made in this video seems to show that : GPT4o < Qwen2.5coder32b < Claude 3.5 Sonnet (new)
The point is to compare quality... simple as that. Once you know quality, you can consider other factors like speed, price, availability, and of course confidentiality. The fact that Qwen2.5-Coder-32B is even close to Claude while being a _small_ open-weight model is amazing. Of course other factors can matter more than just quality. Speed and price are just as important. But limiting it to "Only compare quality when technical specs are comparable" makes no sense.
@@sthobvious actually makes sense because if you think to compare gpt-3.5 and gpt-o1 or gpt-4o, do you really think this is fair? Gpt-3.5: 😭 Gpt-4o & gpt-o1: 🗿🗿
the error produced by gpt was minimal; a "hallucination"
Sweet. I remember, when Chat GPT just appeared, feeling very pessimistic that this tech would be locked in big companies datacenters. Glad I was wrong
Yes, it's so awesome they this technology is going toward open sourcing 👍
you should ask for physics demos like softbody , particles, fluid particles, cloth. Anything math heavy pretty much.
Okay, I will try in the next video
Why do people do these stupid tests where the code can be found 1000 times on the internet.
As explained in the video, I'm looking for other original tests. If you have one that you want me to try in a following video feel free to leave it in a comment so that I can try it in the following video
@@volkovolko If you are testing how to write a snake game, then you are basically testing knowledge retrieval, because that code exists in 1000 variants on the Internet. It gets interesting if you demand variations, like 'but the snake grows in both directions' or 'random obstacles appear and disappear after some time in not too close proximity of the snakes'. Think of whatever you want, but if you can do Tetris or snake is hardly a test for llms these days.
@5m5tj5wg The 'better' model is not one that can retrieve known solutions better, but the one that can piece the solution to a unheard but related problem better. If you can find the question and the answer on the net then comparing a model with 32B params to a Multi-hundred-billion parameter model like GPT4o or sonnet makes even less sense, because of cause they can store more knowledge. You need to ask for solutions to problems where you cannot find the answer on the Internet to evaluate how good a model will be in practical use.
Yes, there is a part of true. However, I think you can all agree that you don't want a 50+ min video. Also most of the code you will ask it to make in the real world will also be knowledge retrieval. As developper we very often have to remake what as already been made. And the Snake game isn't that easy for LLMs. The Tetris game is very difficult and I didn't ever see a first try fully working
And it is interresting to see that the Qwen model did better on these "retrieval" questions than GPT and Anthropic despite being way smaller in terms of parameters. It indicates that knowledge can still be compress a lot more than what we thought
try a next js app.
Okay, I will try in the next video
Funny thing: I tried the same tetris example locally with the q8 and fp16 versions of Qwen coder 2.5 32b and it generated buggy code in both cases. When I tried with the default quantization (q4_k_m if I'm not mistaken) it got perfect the first time (properly bounded and you could lose the game too). I guess there's a luck factor involved.
Yeah, it might be because of the luck factor. Or maybe the architecture of qwen is optimised for high quantizations levels 🤷♂️ Or maybe your q8 version wasn't properly quantized, I think they updated their weight at one moment
luck it's called temperature nowadays :D
Yeah, I now. Top_k also right ? @@66_meme_99
I think you should ask qwen 2.5 coder 32B again to make the tetris game better so it will be fair .. In my opininion In tetris game qwen literally win .. even claude generate better after error , but offcource it failed at first
Yeah, for me the win was for Qwen. But okay, for the following videos, I will always let one second chance for all models. I will soon make a video comparing each size of qwen2.5 coder (so 0.5B vs 1.5B vs 3B vs 7B vs 14B vs 32b) So subscribe if you want be notified ^^ I also started to quantize each model in GGUF and EXL2 on HuggingFace for those who are interested : huggingface.co/Volko76
Seems very interesting I will try it tomorrow, for me nemotron 70b was the best but even on my 4090 I can't run it locally.
I made the video comparing sizes : ruclips.net/video/WPziCratbpc/видео.htmlsi=o3eKo-3pGY78wmMr
Yes, 70B is still a bit too much for consumer grade GPUs
c'est du fake?
Ben non pourquoi ?
nice vid! what's your 3090 setup my guy
Asus ROG STRIX 3090 32Go ddr4 3200MHz i9 11900kf
This is pretty cool to see! It's nice to see how the models compare between each other. For me, even the 3B model was amazing at making a Python snake game. Thanks for the comparison, it really does show the difference.
Yeah, I totaly agree. The Qwen series (especially the coding one for me) are just so amazing. I don't know why they aren't as known as the llama ones.
Do you want me to make a video comparing the 3B to the 32B ?
@@volkovolko Yeah, that would be really cool to see! I'd love to see how the models perform.
Okay, I will try to do it tomorrow
Ta la même voix que UNITY FR mdrrr
Ah mdrrr ben ecoute tant mieux, j'aime bien sa voix xD
Germany countryhuman: too much too much oh yeah
?
@ Germany countryhuman: me singing
Germany countryhuman: Hello Napoleon!
Wtf ?
@ Germany countryhuman: do support me
Comment peut-on uploader un fichier ou une image ? Merci.
Il suffit d'installer la version 4 (qui est en alpha) de ollama : github.com/ollama/ollama/releases/download/v0.4.0-rc5/ollama-windows-amd64.zip Ensuite tu peux installer llama3.2 et utiliser des images. Si tu le souhaites sinon tu peux créer un rag (mais c'est plus complexe)
LLM
Yes, it is. ChatGPT is also an LLM
Salut et surtout merci beaucoup pour votre vidéo 🙏🙂
Ahh ça fait toujours vachement plaisir d'aider les gens 🙏🙏