🇫🇷 Mistral AI's NEW 22B Coding Model with Code Inpainting 🎨 Beats DeepSeekCoder 33B!

Поделиться
HTML-код
  • Опубликовано: 29 сен 2024
  • Meet Codestral, the game-changing code generation model by Mistral AI! This powerful tool assists developers with code completion and interaction through an easy-to-use API. Codestral surpasses the competition, even beating Deepseeker Coder 33B and Llama3 70B! Unlock your coding potential and boost your productivity with Codestral.
    Tell us what you think in the comments below!
    Maxime Tweet: x.com/maximela...
    Mistral Blog Post: mistral.ai/new...
    Le Platforme (use Codestral FREE): chat.mistral.a...
    Hugging Face Card (weights): huggingface.co...
    -----------------
    This video contains affiliate links, meaning if you click and make a purchase, I may earn a commission at no extra cost to you. Thank you for supporting my channel!
    My 4090 machine:
    amzn.to/3QMvE4s - MSI 4090 Suprim Liquid X 24G (best linux compatibility)
    amzn.to/3V5R0My - Corsair 1500i PSU
    amzn.to/4dIwybZ - 12VHPWR Cables that DONT MELT!
    Tech I use to produce my videos:
    amzn.to/4bN5eaR - Samsung T7 2TB SSD USB-C
    amzn.to/4dJFHky - Sandisk 32Gb USB-C flash drive
    amzn.to/44LHZeG - Blue XLR Microphone
    amzn.to/3ULTT3N - Focusrite Scarlett Solo Usb C to XLR interface

Комментарии • 82

  • @AI-Wire
    @AI-Wire 3 месяца назад +2

    "We all know what happened with Devin." Nice engagement bait. Just tell us what you mean. But instead, you bait us for engagement.

    • @aifluxchannel
      @aifluxchannel  3 месяца назад

      Thanks for the feedback, I assumed it was well known that Devin was caught faking their demo about a week after announcing their model.

  • @JoeBrigAI
    @JoeBrigAI 4 месяца назад +5

    looks good. let’s see it a real workflow.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +2

      What would you like to see? webdev, solidiy / web3, I'm all ears!

  • @southcoastinventors6583
    @southcoastinventors6583 4 месяца назад +7

    Nice video and test of Codestral but if you going to do a snake implementation or some of visual program please run it. Need to add some pizazz. Also its great to have some competition from Europe, always look forward to what Mistral releases.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +4

      Thanks for the feedback! I wanted to keep the video under 20 min! Will do a full demo next time.

  • @ppbroAI
    @ppbroAI 4 месяца назад +10

    yup, is rlly good. Tried in 4 bits, I like its explanations so far

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Great to hear! I can't wait to try 8 bit quants once I get back to my GPU machine! :)))

    • @OMGanger
      @OMGanger 4 месяца назад

      Any suggestions on something better than gpto? I feel like it’s not that hard to run tree and retrieve and dump context at each node along it

  • @JakubHohn
    @JakubHohn 4 месяца назад +6

    I really like the coding AIs, but what feels like a great downside is that none of them are capable of CRUDing (create, read, update, delete) files directly. When they will be able to do that, I think they will be radically more useful.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +3

      Good point! I'll add this in the next video. I have noticed these models even struggle to string together relatively simple Typescript / react apps.

  • @siegfriedcxf
    @siegfriedcxf 3 месяца назад +1

    they didnt put codeqwen1.5-7b-chat, its actually score higher in humaneval than codemistral and is way smaller 7b vs 22b. i tried both, codeqwen is actually better.

    • @aifluxchannel
      @aifluxchannel  3 месяца назад

      I haven't tried CodeQwen yet, but I've definitely been impressed with Qwen 1.5 - what kind of coding do you do with this model?

  • @m12652
    @m12652 4 месяца назад +1

    There's been so many changes in javascript, html and css in the last couple of years why would a web dev want to use a tool that is only trained to 2001...

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Base reasoning is key, because it means finetuning on top of newer javascript docs / code is even easier and translates to solid performance after the fact.

    • @m12652
      @m12652 4 месяца назад

      @@aifluxchannel and yet every coder AI model I tried has produced such flaky code it hurts to read it. Even taking into account they might not be trained on new functionality.

  • @cd92606
    @cd92606 4 месяца назад +2

    Excellent overview. Personally my goal is ultimately to only use locally running models, so this is an exciting step!

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Which models are you planning to run locally!?

  • @justindressler5992
    @justindressler5992 4 месяца назад +2

    Cool Mandelbrot set that's the only use case I have for a code gen. Literally the most useful code ever. My entire carrier of 30 years can't say I ever needed or even felt the urge to write a Mandelbrot set.
    Why don't people use real life tasks like write a react login form with unit tests and e2e tests and backend verification with node express server and database again with unit tests. Have it explain security techniques used to protect from hacking and credentials. This is needed in almost every app.
    Until these things can be done flawlessly such as password encrypted in db tls enabled connection data validation avoid code injection, 2fa, cors, SSO with Google... checking, secure session db account scheme and so, rbac. They won't be replacing anyone.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +3

      I generally like to stick to tasks that a human could do, but also tasks that don't take too much time to demo. I generally find that a lot of coding models will "explain away" things they're unsure how to actually implement with pseudo code or explanations of "best practices" - but also because they're just regurgitating documentation when that happens. What else would you like me to focus on / change in future videos when I'm evaluating coding performance?

  • @moak4052
    @moak4052 4 месяца назад +1

    Which ai do you recommend your coding?

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      I generally use DeepSeek Coder 33B and GPT4 ;)

  • @garrettbates2639
    @garrettbates2639 4 месяца назад +3

    I feel like i missed something about Devin

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +3

      Devin turned out to have faked their demo, and in reality was actually quite far away from "replacing software engineers" with ai ;)

    • @garrettbates2639
      @garrettbates2639 4 месяца назад +1

      @@aifluxchannel Ahhh. Makes sense. Not much better than repeatedly prompting other models, I imagine?
      That's unfortunate, but at least it spawned some open source projects to try and do what they pretended to do, I suppose.

  • @Arcticwhir
    @Arcticwhir 4 месяца назад +1

    Doing some testing it can be quite lazy and its creativity is low, although its coding abilites are definietly sharp and have yet to get any bugs. The way i would use this is would be for autocomplete, psuedo code ( you have to be quite detailed).

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Interesting, thanks for sharing your results. Curious what terms / attributes you use to measure how "creative" a coding LLM is? This might help me improve how I test models in the future!

  • @onoff5604
    @onoff5604 4 месяца назад +2

    Many thanks for details in coverage of topic.

  • @hjups
    @hjups 4 месяца назад +1

    An interesting model, but unimpressive in my testing. Although, it seems to be dependent on the language and problem difficulty - high resource languages with simpler problems are more likely to succeed.
    Coming from the computer architecture side (hardware design), I always test the models on low-level C and Verilog problems (relatively simple due to low expectations). GPT 3.5 and LLama3-70B succeeded more often than not, but Codestral failed all of my test cases. In fact, Codestral broke math by insisting that a*b == a+b if b is odd else a random number (what ever was previously stored). When pointing out the contradiction, it only double-down. LLama3-70B and GPT 3.5 have never failed that badly for me.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      It's been a while since I've written verilog, but definitely an interesting edge case to test Codestral with. What kind of work do you generally use LLama3-70B to assist / accelerate?

    • @hjups
      @hjups 4 месяца назад

      ​@@aifluxchannel It's a fun yet frustrating language.
      I haven't been using LLama3-70B to assist with any hardware tasks, it still fails on anything useful (only succeeds at simple tasks).
      GPT4 can sometimes generate more complicated Verilog, but usually requires manual correction. It's mostly useful for generating sub-function behavior in tooling (C and python). That still requires manual guidance, but speeds up development by ~10x. I would be more hopeful of LLama3-400B, but I guess that won't be released.

  • @AaronALAI
    @AaronALAI 4 месяца назад +2

    I've been having great success with wizards mixtral 8*22b model for coding.
    My workflow is pretty simple, I use textgen webui to talk to my models and the spider ide in another window and just talk to the llm like a normal person.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +1

      It'll be curious to see how similar the evals for those two models are. Given they're the same size I wonder if this is just a super-sampling of one of the "experts" from their 8x22B model

    • @AaronALAI
      @AaronALAI 4 месяца назад

      Ooh interesting hypothesis, I noticed it was a 22b model they released and wondered if it was related in some way to their 8*22b model.​@@aifluxchannel

  • @tapu_
    @tapu_ 4 месяца назад +1

    You should test out if it can write and run DreamBerd, the greatest language ever.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Hahaha can't tell if this is a joke or a real programming language?

  • @linklovezelda
    @linklovezelda 4 месяца назад +2

    Check your title bro

  • @peterwood6875
    @peterwood6875 4 месяца назад +1

    I like to use Claude 3 haiku for coding. I can always use opus for things like coming up with the coding project itself, or to ask tricky technical questions. I talk to haiku about the implementation and to plan, then get it to come to with some unit tests, then get it to write the code. Getting it to think a bit before generating the code seems to get it to generate good code

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +1

      Thanks for sharing! Have you used the new phi-3 as well? Curious what kind of coding you're using this for?

    • @peterwood6875
      @peterwood6875 4 месяца назад

      @@aifluxchannel I often have conversations with Claude about maths and physics. Writing some code to do some calculations is a good way to familiarise oneself with relevant concepts, and more fun than doing calculations by hand with a pen and paper. A recent project was to implement a homomorphism and representations of Lie groups that are related to quantisation of spin. I haven't tried phi3. It looks like some versions have a decent context length, but I find that Claude's context length isn't quite enough for the way I use it.

  • @VastCNC
    @VastCNC 4 месяца назад +1

    I’d like to see a model tuned to a specific language other than Python and JS derivatives. Elixir is a prime candidate with an excellent documentation library (hex docs)

    • @jonmichaelgalindo
      @jonmichaelgalindo 4 месяца назад +1

      Base model training literally needs hundreds of millions of lines of code.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      It would be interesting to train the model with as little documentation / english commentary and context to see if a more accurate or actionable model would come from it.

    • @VastCNC
      @VastCNC 4 месяца назад

      @@aifluxchannel do you think fine tune would be sufficient? I think with elixir, outside of the documentation, open source repositories would be of higher quality because of the skill involved to become productive compared with Python and Js

  • @OMGanger
    @OMGanger 4 месяца назад +1

    Phi has 128k context and is only 4B?

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +1

      It's more about how you use the context window than it's length ;)

  • @sevilnatas
    @sevilnatas 4 месяца назад +2

    Wait, what happened to Devin?

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +2

      Demo was fake, wasn't actually as capable as it's creators claimed.

    • @sevilnatas
      @sevilnatas 4 месяца назад +1

      @@aifluxchannel Ah, crazy! I guess it was good enough for Microsoft.

  • @lel7531
    @lel7531 4 месяца назад +1

    Why are you not running the code ?

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +1

      I can do this in livestreams, but for model review videos it takes too much time. thanks for the suggestion.

  • @hobologna
    @hobologna 4 месяца назад +1

    code inpainting is a brilliant concept!

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      I think it could become a really popular way to interact with coding models, especially if you could point / direct where you want it to focus in a codebase with comments.

  • @maloukemallouke9735
    @maloukemallouke9735 4 месяца назад +1

    thanks for this experience

  • @jonmichaelgalindo
    @jonmichaelgalindo 4 месяца назад +1

    But can it write ffmpeg commands?

    • @mirek190
      @mirek190 4 месяца назад +1

      yes
      Also you can paste the newest documentation the works even better

    • @jonmichaelgalindo
      @jonmichaelgalindo 4 месяца назад

      @@mirek190 Have you tried? I guarantee you haven't. Not even GPT-4 can do anything more complicated than mp3 -> ogg, and even struggles with something simple like that.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +1

      GPT4 and Mixtral 8x7B are particularly good with these commands. This was one of the first things that really impressed me about these models.

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +1

      It can do things much more complicated! You should try it out.

    • @jonmichaelgalindo
      @jonmichaelgalindo 4 месяца назад

      @@aifluxchannel We must be prompting it differently then. :-/
      For example (real example): I wanted to input my two camera videos, convert them from fisheye to equirectangular, combine them with one on the left and the other on the right (stereo), crop 120 pixels from left and right of both, move the right down 180 pixels (bad lens alignment from manufacturer), then scale the entire output to no more than 8K. GPT-4 was nowhere near being able to write the command. (I never did figure it out. I'm doing those operations manually in Blender.)

  • @PythonAndy
    @PythonAndy 4 месяца назад +1

    thanks for the vid ♥

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      You bet! Let us know what you'd like to see more of!

  • @firstlast493
    @firstlast493 4 месяца назад +1

    How about AutoCoder 33b?

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      We can test this soon! Is this your go-to coding model?

    • @firstlast493
      @firstlast493 3 месяца назад

      @@aifluxchannel No. There's just very little video about this model.

  • @dkracingfan2503
    @dkracingfan2503 4 месяца назад +1

    Yes it is beats it!

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Pretty exciting isn't it? What kind of finetunes do you want to see done to this mistral model?

  • @pn4960
    @pn4960 4 месяца назад +1

    super cool!

    • @aifluxchannel
      @aifluxchannel  4 месяца назад

      Thanks, we're glad you liked it!

  • @pigeon_official
    @pigeon_official 4 месяца назад +7

    but GPT-4o is barely decent at coding i cant image the open source stuff will be remotely useful if GPT-4o cant do 90% of coding tasks more complex than like a intro to coding course type thing

    • @aifluxchannel
      @aifluxchannel  4 месяца назад +5

      I do generally agree that gtp4o (outside of open AI's demo) is basically useless for real coding tasks. Especially as a co-pilot.

    • @brulsmurf
      @brulsmurf 4 месяца назад

      the "opensource stuff" isnt lacking behind. and yes, there are a lot of problems with using llm's for coding tasks. you need to be very carefull

    • @yongamamkolokotho9904
      @yongamamkolokotho9904 4 месяца назад

      I was creating a bfs generated Maze using 4o so far for me its impressive

    • @handsanitizer2457
      @handsanitizer2457 4 месяца назад

      He means for anything complex ​@@yongamamkolokotho9904