Deepseek-R1-Lite (Tested): This OPENSOURCE Model BEATS O1 & CLAUDE 3.5 SONNET!?

Поделиться
HTML-код
  • Опубликовано: 21 ноя 2024

Комментарии • 45

  • @robinmountford5322
    @robinmountford5322 19 часов назад +2

    Sonnet is a force to be reckoned with when it comes to coding.

  • @DGFilmsNYC
    @DGFilmsNYC День назад +11

    I gave deep think a system prompt before I, rewrote your confetti prompt because when you prompt cline you say it like this write a website in HTML, CSS , and JS, in your test you say you can use CSS and JS , I got the confetti in one shot, just press run html in the window... Let's add a 2nd test update the questions and a/b test the tests to really test these models

  • @vdbv0
    @vdbv0 День назад +8

    Oh lord ! Just tested it, it's wild ! Loving it ! I'm hoping the API cost is the same as now, if it's the case I will forget Heroku quickly!

    • @ANSHU61936
      @ANSHU61936 День назад

      What do you mean by you will forget heroku?
      They also have some kind of Bedrock alternate?

    • @vdbv0
      @vdbv0 День назад

      @@ANSHU61936 Claude Haïku sorry! Wrote too quick 🙏

    • @luizbueno5661
      @luizbueno5661 20 часов назад

      I’m curious about how to drop Heruku and why you could do that once deepseeks api costs little

    • @ANSHU61936
      @ANSHU61936 20 часов назад

      @@vdbv0 same question again, why are you dropping heroku? They also provide some kind of ai model?

  • @LetYourLightShine5218
    @LetYourLightShine5218 День назад +3

    The AI's response to Test #3 was correct but it would have been interesting if it had been able to further speculate that C possibly was the other person playing table tennis with E unless E was playing solo with the table against the wall.

  • @bot.
    @bot. День назад +2

    Now this is interesting to see. Finally a new model showing highly promising results. Well lets see what I think of it. Also, forgive me if it is a bit chaotically structured, I am writing this as I watch the video. With that out of the way, let us get started!
    As weird as it is, I would consider test one neither a fail nor pass, as what the model went through eerily resembled a human being stunned by a question, and not seeing the logical answer immediately. Hard to say how this can be improved, but I theorise the problem may solve itself once the model is given more time to think without rushing. Maybe even having it change perspectives at some point?
    Moving on to test #4, we can tell that it did objectively fail, but the reasoning chain was obviously halted prematurely, presumably by the system itself to limit the amount of tokens spent on thinking. Smart choice by DeepSeek, yet obviously a performance limiter in cases like these. Would love to see what the answer would be if given as much time as it wants. Yet again, if they release the model open source down the line, these compute power problems can be solved easily by the users themselves (of course assuming that it is not an absurdly big model, which it unfortunately does seem to be the case here. Would love to be incorrect on the size though.)
    Yet again, more compute time will not solve all the problems, as we can see from test #9 that it was unable to create a proper website for the confetti. Unfortunately though, we did not see the code perform outside of DeepSeek's own environment, which may itself be the limiting factor in this case, not the model itself. For more rigour the code should have been run also through more conventional means just in case, something like how the Python code was externally executed. (Also, do pardon me if my assumptions about DeepSeek's environment are incorrect, I am not that familiar with web frameworks or their execution.) I would say something similar for test #12, but I did not catch if the DeepSeek environment was used, so I am forced into mere speculation for this.
    Sorry for the long paragraph, but moving on to Test #11, I would consider it a fail from an artistic perspective, but the model itself was most likely not trained on SVG creation, so the expected potential is rather low. However, it is still impressive that it created a general shape of a butterfly.
    All in all, a very, very exciting model. Especially if it is able to be used on most systems.

  • @aculz
    @aculz День назад +3

    well, this is great result for model named "Lite" which can almost beat o1 not just o1-mini. im very sure that might the "Large" one can beat sonnet aswell so we can have an Greatest Open Source model and much much cheaper than sonnet. cant wait for another brilliant move from this company

  • @jargoti20
    @jargoti20 18 часов назад

    Interesting comparison. I would love to see the API coming out so we can implement it in our own apps

  • @MeinDeutschkurs
    @MeinDeutschkurs День назад

    Have you tested AYA? Great for structured outputs.

  • @AnugrahPrahasta
    @AnugrahPrahasta День назад +2

    WOW. Finally, deepseek!

  • @LetYourLightShine5218
    @LetYourLightShine5218 День назад +2

    While the "thinking outside the box" is impressive I think the AI failed Test #1 for 2 reasons. First, the AI said >>there doesn't appear to be any country with an official English name that ends in "lia."

  • @PrinzMegahertz
    @PrinzMegahertz 16 часов назад

    With regards to question 2 - shouldn't C be playing table tennis with E? If noone else in the house and C is not playing, who is E playing with?

  • @TURKLERDIZIS
    @TURKLERDIZIS День назад +2

    claude sonnet 3.5 is the best choice for coding

  • @perfectartiste6332
    @perfectartiste6332 День назад

    good one, this will really be a game changer

  • @stephensamuel2770
    @stephensamuel2770 День назад

    First to view, first to comment.
    This is quite an impressive. I have used it and the results is so amazing.

  • @BeastModeDR614
    @BeastModeDR614 День назад

    Nice open models are getting close, Cant wait until we can run cline locally and do full stack applications with no limit

  • @TheProtein83
    @TheProtein83 День назад

    Great job. I disagree with the opinion about CoT and coding. In case of complicated architectures, thinking steo by step should provide better results

  • @luizgustavs
    @luizgustavs День назад

    i like very much the artstyle of the images in the beginning of your videos, would you mind share the prompt to get this art style? Would be greatly appreciated

    • @AICodeKing
      @AICodeKing  День назад +2

      It's very basic.. Something like "A panda in a forest, in front of a campfire, cinematic, anime style".

  • @HikaruAkitsuki
    @HikaruAkitsuki 3 часа назад

    I wonder if this is usable for doing Thesis? Anyway, I don't think it's monologue is necessary.

  • @michaelrichey8516
    @michaelrichey8516 День назад

    logically, your 3rd question has a better answer than "unknown"
    Statement 1 says there are 5 people in a house, naming them. 4 people are given activities with the 5th (C) not being mentioned.
    E, however, is playing table tennis, a 2 player game. Logically, E is playing with C, because there are 5 people in the house and table tennis cannot be played alone.

  • @idea_list
    @idea_list День назад

    I wonder if the answer to question 3 should be "playing table tennis". It's hard to imagine that E is playing tennis solo, right..?

  • @dixalex02
    @dixalex02 День назад

    I'm having trouble creating a markdown file of the pixijs api. Something about the url syntax prevents it from being scraped. Any advice?

  • @rassular
    @rassular День назад

    Can you test the new Mistral Large 2411?

  • @eado9440
    @eado9440 День назад

    Open source model this good is crazy, just hope it's not like 500 GB

    • @aculz
      @aculz День назад

      its okay for 500GB, if we cant use it on our local then we can use their's which is crazy cheap than sonnet and gpt

  • @TawnyE
    @TawnyE День назад

    E
    Edit: I just now noticed on the top commenter with the most hearted comments, that is a pog

  • @thatbeezie
    @thatbeezie День назад

    How do you use this open source in like in aider or va code extensions/apps?

    • @aculz
      @aculz День назад

      you need to pay to use their api

  • @DouhaveaBugatti
    @DouhaveaBugatti День назад

    Suppose we combine it with the coder model☠️

  • @flutterflowexpert
    @flutterflowexpert День назад

    I think you need a new benchmark 😂

  • @warlockassim4240
    @warlockassim4240 День назад

    first and bro answer how to make aider detect my local project?

    • @AICodeKing
      @AICodeKing  День назад +1

      It should probably detect it automatically.

    • @Gorops
      @Gorops День назад +1

      Are you running it in the project folder/repository?

    • @wasimdorboz
      @wasimdorboz День назад

      @@AICodeKing i am using linux and it not , i think i should do /save then /load not ?

    • @wasimdorboz
      @wasimdorboz День назад

      @@Gorops yep

  • @다루루
    @다루루 День назад

    😊

  • @gabrielkasonde367
    @gabrielkasonde367 День назад

    Yoooooo😂🎉

  • @randomlettersqzkebkw
    @randomlettersqzkebkw День назад

    openai has no moat

  • @shay5338
    @shay5338 День назад

    haha 8th one to comment, it would be cool if you were to show how can we access these llm models for free without any limits

    • @aculz
      @aculz День назад +1

      well. just install it on your ollama or LM Studio and use it locally. but be sure you have the greatest GPU or it will perform very slow