🇫🇷 Mistral AI's NEW 22B Coding Model with Code Inpainting 🎨 Beats DeepSeekCoder 33B!
HTML-код
- Опубликовано: 29 сен 2024
- Meet Codestral, the game-changing code generation model by Mistral AI! This powerful tool assists developers with code completion and interaction through an easy-to-use API. Codestral surpasses the competition, even beating Deepseeker Coder 33B and Llama3 70B! Unlock your coding potential and boost your productivity with Codestral.
Tell us what you think in the comments below!
Maxime Tweet: x.com/maximela...
Mistral Blog Post: mistral.ai/new...
Le Platforme (use Codestral FREE): chat.mistral.a...
Hugging Face Card (weights): huggingface.co...
-----------------
This video contains affiliate links, meaning if you click and make a purchase, I may earn a commission at no extra cost to you. Thank you for supporting my channel!
My 4090 machine:
amzn.to/3QMvE4s - MSI 4090 Suprim Liquid X 24G (best linux compatibility)
amzn.to/3V5R0My - Corsair 1500i PSU
amzn.to/4dIwybZ - 12VHPWR Cables that DONT MELT!
Tech I use to produce my videos:
amzn.to/4bN5eaR - Samsung T7 2TB SSD USB-C
amzn.to/4dJFHky - Sandisk 32Gb USB-C flash drive
amzn.to/44LHZeG - Blue XLR Microphone
amzn.to/3ULTT3N - Focusrite Scarlett Solo Usb C to XLR interface
"We all know what happened with Devin." Nice engagement bait. Just tell us what you mean. But instead, you bait us for engagement.
Thanks for the feedback, I assumed it was well known that Devin was caught faking their demo about a week after announcing their model.
looks good. let’s see it a real workflow.
What would you like to see? webdev, solidiy / web3, I'm all ears!
Nice video and test of Codestral but if you going to do a snake implementation or some of visual program please run it. Need to add some pizazz. Also its great to have some competition from Europe, always look forward to what Mistral releases.
Thanks for the feedback! I wanted to keep the video under 20 min! Will do a full demo next time.
yup, is rlly good. Tried in 4 bits, I like its explanations so far
Great to hear! I can't wait to try 8 bit quants once I get back to my GPU machine! :)))
Any suggestions on something better than gpto? I feel like it’s not that hard to run tree and retrieve and dump context at each node along it
I really like the coding AIs, but what feels like a great downside is that none of them are capable of CRUDing (create, read, update, delete) files directly. When they will be able to do that, I think they will be radically more useful.
Good point! I'll add this in the next video. I have noticed these models even struggle to string together relatively simple Typescript / react apps.
they didnt put codeqwen1.5-7b-chat, its actually score higher in humaneval than codemistral and is way smaller 7b vs 22b. i tried both, codeqwen is actually better.
I haven't tried CodeQwen yet, but I've definitely been impressed with Qwen 1.5 - what kind of coding do you do with this model?
There's been so many changes in javascript, html and css in the last couple of years why would a web dev want to use a tool that is only trained to 2001...
Base reasoning is key, because it means finetuning on top of newer javascript docs / code is even easier and translates to solid performance after the fact.
@@aifluxchannel and yet every coder AI model I tried has produced such flaky code it hurts to read it. Even taking into account they might not be trained on new functionality.
Excellent overview. Personally my goal is ultimately to only use locally running models, so this is an exciting step!
Which models are you planning to run locally!?
Cool Mandelbrot set that's the only use case I have for a code gen. Literally the most useful code ever. My entire carrier of 30 years can't say I ever needed or even felt the urge to write a Mandelbrot set.
Why don't people use real life tasks like write a react login form with unit tests and e2e tests and backend verification with node express server and database again with unit tests. Have it explain security techniques used to protect from hacking and credentials. This is needed in almost every app.
Until these things can be done flawlessly such as password encrypted in db tls enabled connection data validation avoid code injection, 2fa, cors, SSO with Google... checking, secure session db account scheme and so, rbac. They won't be replacing anyone.
I generally like to stick to tasks that a human could do, but also tasks that don't take too much time to demo. I generally find that a lot of coding models will "explain away" things they're unsure how to actually implement with pseudo code or explanations of "best practices" - but also because they're just regurgitating documentation when that happens. What else would you like me to focus on / change in future videos when I'm evaluating coding performance?
Which ai do you recommend your coding?
I generally use DeepSeek Coder 33B and GPT4 ;)
I feel like i missed something about Devin
Devin turned out to have faked their demo, and in reality was actually quite far away from "replacing software engineers" with ai ;)
@@aifluxchannel Ahhh. Makes sense. Not much better than repeatedly prompting other models, I imagine?
That's unfortunate, but at least it spawned some open source projects to try and do what they pretended to do, I suppose.
Doing some testing it can be quite lazy and its creativity is low, although its coding abilites are definietly sharp and have yet to get any bugs. The way i would use this is would be for autocomplete, psuedo code ( you have to be quite detailed).
Interesting, thanks for sharing your results. Curious what terms / attributes you use to measure how "creative" a coding LLM is? This might help me improve how I test models in the future!
Many thanks for details in coverage of topic.
Glad it was helpful!
An interesting model, but unimpressive in my testing. Although, it seems to be dependent on the language and problem difficulty - high resource languages with simpler problems are more likely to succeed.
Coming from the computer architecture side (hardware design), I always test the models on low-level C and Verilog problems (relatively simple due to low expectations). GPT 3.5 and LLama3-70B succeeded more often than not, but Codestral failed all of my test cases. In fact, Codestral broke math by insisting that a*b == a+b if b is odd else a random number (what ever was previously stored). When pointing out the contradiction, it only double-down. LLama3-70B and GPT 3.5 have never failed that badly for me.
It's been a while since I've written verilog, but definitely an interesting edge case to test Codestral with. What kind of work do you generally use LLama3-70B to assist / accelerate?
@@aifluxchannel It's a fun yet frustrating language.
I haven't been using LLama3-70B to assist with any hardware tasks, it still fails on anything useful (only succeeds at simple tasks).
GPT4 can sometimes generate more complicated Verilog, but usually requires manual correction. It's mostly useful for generating sub-function behavior in tooling (C and python). That still requires manual guidance, but speeds up development by ~10x. I would be more hopeful of LLama3-400B, but I guess that won't be released.
I've been having great success with wizards mixtral 8*22b model for coding.
My workflow is pretty simple, I use textgen webui to talk to my models and the spider ide in another window and just talk to the llm like a normal person.
It'll be curious to see how similar the evals for those two models are. Given they're the same size I wonder if this is just a super-sampling of one of the "experts" from their 8x22B model
Ooh interesting hypothesis, I noticed it was a 22b model they released and wondered if it was related in some way to their 8*22b model.@@aifluxchannel
You should test out if it can write and run DreamBerd, the greatest language ever.
Hahaha can't tell if this is a joke or a real programming language?
Check your title bro
Thanks for the tip!
I like to use Claude 3 haiku for coding. I can always use opus for things like coming up with the coding project itself, or to ask tricky technical questions. I talk to haiku about the implementation and to plan, then get it to come to with some unit tests, then get it to write the code. Getting it to think a bit before generating the code seems to get it to generate good code
Thanks for sharing! Have you used the new phi-3 as well? Curious what kind of coding you're using this for?
@@aifluxchannel I often have conversations with Claude about maths and physics. Writing some code to do some calculations is a good way to familiarise oneself with relevant concepts, and more fun than doing calculations by hand with a pen and paper. A recent project was to implement a homomorphism and representations of Lie groups that are related to quantisation of spin. I haven't tried phi3. It looks like some versions have a decent context length, but I find that Claude's context length isn't quite enough for the way I use it.
I’d like to see a model tuned to a specific language other than Python and JS derivatives. Elixir is a prime candidate with an excellent documentation library (hex docs)
Base model training literally needs hundreds of millions of lines of code.
It would be interesting to train the model with as little documentation / english commentary and context to see if a more accurate or actionable model would come from it.
@@aifluxchannel do you think fine tune would be sufficient? I think with elixir, outside of the documentation, open source repositories would be of higher quality because of the skill involved to become productive compared with Python and Js
Phi has 128k context and is only 4B?
It's more about how you use the context window than it's length ;)
Wait, what happened to Devin?
Demo was fake, wasn't actually as capable as it's creators claimed.
@@aifluxchannel Ah, crazy! I guess it was good enough for Microsoft.
Why are you not running the code ?
I can do this in livestreams, but for model review videos it takes too much time. thanks for the suggestion.
code inpainting is a brilliant concept!
I think it could become a really popular way to interact with coding models, especially if you could point / direct where you want it to focus in a codebase with comments.
thanks for this experience
Thanks for watching!
But can it write ffmpeg commands?
yes
Also you can paste the newest documentation the works even better
@@mirek190 Have you tried? I guarantee you haven't. Not even GPT-4 can do anything more complicated than mp3 -> ogg, and even struggles with something simple like that.
GPT4 and Mixtral 8x7B are particularly good with these commands. This was one of the first things that really impressed me about these models.
It can do things much more complicated! You should try it out.
@@aifluxchannel We must be prompting it differently then. :-/
For example (real example): I wanted to input my two camera videos, convert them from fisheye to equirectangular, combine them with one on the left and the other on the right (stereo), crop 120 pixels from left and right of both, move the right down 180 pixels (bad lens alignment from manufacturer), then scale the entire output to no more than 8K. GPT-4 was nowhere near being able to write the command. (I never did figure it out. I'm doing those operations manually in Blender.)
thanks for the vid ♥
You bet! Let us know what you'd like to see more of!
How about AutoCoder 33b?
We can test this soon! Is this your go-to coding model?
@@aifluxchannel No. There's just very little video about this model.
Yes it is beats it!
Pretty exciting isn't it? What kind of finetunes do you want to see done to this mistral model?
super cool!
Thanks, we're glad you liked it!
but GPT-4o is barely decent at coding i cant image the open source stuff will be remotely useful if GPT-4o cant do 90% of coding tasks more complex than like a intro to coding course type thing
I do generally agree that gtp4o (outside of open AI's demo) is basically useless for real coding tasks. Especially as a co-pilot.
the "opensource stuff" isnt lacking behind. and yes, there are a lot of problems with using llm's for coding tasks. you need to be very carefull
I was creating a bfs generated Maze using 4o so far for me its impressive
He means for anything complex @@yongamamkolokotho9904