DeepSeek R1 Fully Tested - Insane Performance

Поделиться
HTML-код
  • Опубликовано: 1 фев 2025

Комментарии • 2,6 тыс.

  • @milesmitchell3193
    @milesmitchell3193 6 дней назад +3037

    An open source AI took OpenAI’s job. That’s poetic justice.

    • @techviking23
      @techviking23 6 дней назад

      unfortunately it's Chinese and has censorship re Chinese political issues

    • @Dordordord
      @Dordordord 6 дней назад +200

      An open AI overtook the most famous closed AI company 'OpenAI'😂

    • @brucelin8950
      @brucelin8950 6 дней назад +39

      From a dictator's coutry with zero privacy and broken patent laws. 😂

    • @oot007
      @oot007 6 дней назад +83

      @@brucelin8950 copehard in your yoo essay dreams. You are still at stage 1. 4 more to go. 🤣

    • @Goodfellow1234567
      @Goodfellow1234567 6 дней назад +123

      OpenAI blocked China to access it. Deepseek made its model / source code open to all. What an irony !

  • @tengdayz2
    @tengdayz2 10 дней назад +1576

    It actually fills in the thinking gaps allowing you follow along and learn with it. That is super cool

    • @patruff
      @patruff 10 дней назад +57

      This is very underrated, seeing the thought process is great. Having to go through these as a human is annoying but would love to see if running it on its own thought chains could detect issues.

    • @tengdayz2
      @tengdayz2 10 дней назад +4

      @@patruff ollama has it

    • @patruff
      @patruff 10 дней назад +1

      @@tengdayz2 what do you mean? Is there a command or something? I've used ollama before but don't know how to add the response as an input with a message like "critique this response and try harder" or something

    • @tengdayz2
      @tengdayz2 10 дней назад +2

      @@patruffyou can use the ollama run command in the shell where it's installed to pull the model. Then use the model to answer this question :). I prefer to do my own digging, but encouraging you to satisfy your own curiosity.

    • @patruff
      @patruff 10 дней назад +2

      @tengdayz2 okay I'm sort of confused but I'm guessing you mean just conversing with the model, I do that, it's good, but I'm looking for scripting out the capture of thought chains so I can use it for fine-tuning later

  • @norbertschmidt
    @norbertschmidt 10 дней назад +1467

    I like the reasoning more than the actual response, totally fascinating

    • @SN-kk2bl
      @SN-kk2bl 8 дней назад +61

      absolutely love the reasoning. I often have a hard time thinking of varioous edge cases and this eliminates that and allows me to think more creatively.

    • @jordanfarr3157
      @jordanfarr3157 7 дней назад +18

      Used it far difficult earnest work and I also found the internal monologue to be every bit as helpful as the output response. It opened up new conversation opportunities that brought out completely new responses that I otherwise couldn't have generated without that information. Seeing the monologue is way way more important than I thought -- maybe because I didn't get to see it from o1.

    • @Fireneedsair
      @Fireneedsair 6 дней назад +17

      Well, studying reasoning is WAY MORE USEFUL but so many lazy humans out there

    • @SN-kk2bl
      @SN-kk2bl 6 дней назад +3

      @Fireneedsair good point

    • @93_SUPREME
      @93_SUPREME 6 дней назад +1

      Have you all looked up what they did with the moth brain?

  • @Reydriel
    @Reydriel 5 дней назад +172

    The addition of the reasoning process is so SO much more valuable to the user than just giving a response only, this shit's crazy

    • @CoronaryArteryDisease.
      @CoronaryArteryDisease. 5 дней назад +3

      I agree!

    • @HermanWillems
      @HermanWillems 4 дня назад

      That is correct COT is nice, but it's not always desired. You can pump up a bad model with this technique. Now think of having a proper good model from OpenAI/Anthropic and slap COT (Chain of thought) to that model. That would surpass deepseek by alot. It's just a post processing technique. Something you want to be able to turn on and off. It's not always desired.

    • @robertmuckle2985
      @robertmuckle2985 3 дня назад +2

      @@HermanWillems Like...why read the book when you can judge its cover?

  • @sMcLouder
    @sMcLouder 6 дней назад +217

    I just tried it locally and asked it a view questions from my personal field of expertise. It provided me in real time with answers on the level of PhD-grade expert in this field. Also, it can not only explain things plausibly, but it is able to apply this knowledge to solve problems in a meaningful way. I am very impressed and, frankly - scared.
    So now that I have a complete set of PhD-level experts of all fields on my graphics card, I just need to make good use of the team. Ressources on par with an enterprise-level talent pool at hand. I struggle to fathom what this implies.

    • @biaochen4374
      @biaochen4374 5 дней назад +7

      This tech revolution is just too fundamental for society to integrate into our daily lives. It might take decades to accelerate productivity in all fields.

    • @xexaderatx9056
      @xexaderatx9056 5 дней назад +1

      It's crazy how u have that power in ur hand, lol

    • @americangrubs664
      @americangrubs664 5 дней назад +1

      It blew my mind!!

    • @americangrubs664
      @americangrubs664 5 дней назад +1

      Wasn’t it sooooo human like?

    • @poornoodle9851
      @poornoodle9851 5 дней назад +5

      What level of education do you need to ask AI the question and what level is required to understand AIs answer? Using deepseek I wouldn’t be able to replace you. I wouldn’t know what prompt to create and I wouldn’t understand the answer.

  • @GigaCrafty
    @GigaCrafty 10 дней назад +156

    following the chain of thought is such a satisfying way of learning

  • @Darkmatter321
    @Darkmatter321 7 дней назад +1413

    This is a competition between the Chinese in the US and the Chinese in China.

  • @Akalin123
    @Akalin123 9 дней назад +182

    I just used it to assist with a statistical signal processing coursework to estimate target trajectory and velocity from millimeter wave radar data.
    It's not a easy assignment and it went wrong, but it provided ideas and steps to solve the assignment so I could easily correct it, which is much better than other LLM models.
    This is a common problem with LLM models, sometimes the output is worthless but the 'thought process' is enlightening.

    • @jordanfarr3157
      @jordanfarr3157 6 дней назад +6

      @Akalin123 love this comment, and had a similar experience in my technical domain.

  • @yanlord
    @yanlord 8 дней назад +917

    and This is a side product of a Chinese hedgefund, they do it for fun🤣🤣🤣🤣🤣

    • @Mandom007
      @Mandom007 7 дней назад +32

      Huawei is gonna insta buy them, lol.

    • @yudogcome5901
      @yudogcome5901 7 дней назад +134

      ​@@Mandom007 deepseek正在向使用华为GPU转移,最近的测试结果是性能比H100低5%,但价格便宜75%

    • @digidope
      @digidope 6 дней назад

      there is western hedge funds that could easily do the same just for fun, but they dont. They just horde the billions to to some vault instead.

    • @ningwei3772
      @ningwei3772 6 дней назад +56

      The founder of this is a quant managing hundred of billions on the CHinese stock market, he has less than 100 people and make this model as an interests with 5 million

    • @bongkem2723
      @bongkem2723 6 дней назад

      @@yudogcome5901 holy cow, that's insane cost saving, and now Huawei will rise again, what a time !!!

  • @orhunyeldan9800
    @orhunyeldan9800 6 дней назад +494

    I asked to DeepSeek (using the web chat) some detailed and hard to grasp concepts about Roger Penrose's Conformal Cyclic Cosmology theory. It didn't conflict with itself, it didn't give bullshitty nonsense answers. And it adressed all the points correctly, I am mindblown.

    • @StuggleIsSurreal
      @StuggleIsSurreal 5 дней назад +26

      Ask it about the Chinese government.

    • @isabelleryo5099
      @isabelleryo5099 5 дней назад +73

      @@StuggleIsSurrealwawa is barking

    • @sincerexu3040
      @sincerexu3040 5 дней назад +79

      @@StuggleIsSurreal Why are you always obsessed with grand narratives?

    • @bobbyqiu
      @bobbyqiu 5 дней назад

      ​@@StuggleIsSurrealmedia slave

    • @Tiauhui
      @Tiauhui 5 дней назад +56

      @@StuggleIsSurreal Eating too many sour grapes, thats so sad。。。haha

  • @annaczgli2983
    @annaczgli2983 10 дней назад +1564

    If this is not "innovation" by China, IDK what is! Well done! Credit where credit is desrved!

    • @JohnSmith762A11B
      @JohnSmith762A11B 10 дней назад +146

      Pretty staggering to imagine what they might be able to do if they were allowed to import modern GPUs.

    • @hqcart1
      @hqcart1 10 дней назад +115

      @ you woul be living in another dimension if you really think they don't have GPUs sold under the table.

    • @rishhibala6828
      @rishhibala6828 10 дней назад

      ​@@hqcart1his point is still valid, imagine what they could do if they were allowed to import them

    • @nahlene1973
      @nahlene1973 10 дней назад +66

      @@JohnSmith762A11Bi remember when i went to art school there is this quote: the greatest enemy of art is the lack of limitations😂

    • @JohnSmith762A11B
      @JohnSmith762A11B 10 дней назад +41

      @@hqcart1 Of course they do, but not in the quantities the big US frontier labs have them.

  • @Masakrer
    @Masakrer 10 дней назад +676

    You know what is even more cool? That you can run distilled R1-32b model on medium grade personal PC and get this Tetris game done locally and also reasoning questions answered. This is some crazy shit when you compare it to what we could do a ~year ago with local models. I ran it today on i-13600K with 32Gb Ram and RTX-4070 Super (12Gb VRAM) - and damn I know I had stunning 5 tok/sec... so some tasks could take time. Yet it's able to complete the tasks you gave here locally on such medicore machine.
    We're cooked man. Like holy cow cooked.

    • @tristanvaillancourt5889
      @tristanvaillancourt5889 10 дней назад +45

      The 8b on my RTX 3060 with a really old i7 was pumping out 45 tokens/s. It's not 32b but based on the published performance graphs, the 8b is no slouch. 5 tok/s with a 32b on a home PC is still pretty good. I'm in love with R1.

    • @mal2ksc
      @mal2ksc 10 дней назад +73

      I did it with an i5-8500 (48 GB RAM) and a 3060 (12 GB VRAM), using the 70B parameter model. It was more like two tokens a second, with a chain of thought latency of a minute or two before each answer. But yes, all of this really does run on potatoes. That's why I think embargoes just make it more expensive for China to develop these models without actually limiting what they can do. They just have to use more power, and maybe twice the time. No single answer will be fast, but that is offset by running a ton of operations in parallel.
      OpenAI is who is really cooked. Now they have to know that whatever they release, it will only be bleeding edge for three months before China replicates it and open sources it. This means the whole business model for OpenAI is non-viable. The time window to recoup return on investment is just not there.

    • @GosuCoder
      @GosuCoder 10 дней назад +8

      Yes this is what i've been spending most of my time testing

    • @holdthetruthhostage
      @holdthetruthhostage 10 дней назад +1

      LM STUDIO I have 16gb VRam not working on AMD & 128gb ram

    • @Masakrer
      @Masakrer 10 дней назад +59

      Yeah i feel pretty much like Im back in 90s and Im launching some of the first primitive 3D graphics games on my PC and im shitting myself from hype while looking at Lara Croft triangle boobs with 6 FPS, honestly thinking its quite good performance, lol.
      Now, when I think on how much of a leap we took in terms of graphics, if same (at least) happens to AI in next years…

  • @ccdj35
    @ccdj35 10 дней назад +169

    I love how accurate and human like the thinking process is.

    • @93_SUPREME
      @93_SUPREME 6 дней назад +10

      They are using real human brains and growing them in lab for these

    • @syarifairlangga4608
      @syarifairlangga4608 5 дней назад +1

      @@93_SUPREME yeah they can use the prisoner brain, i mean how can they be this cheap lmao

    • @93_SUPREME
      @93_SUPREME 5 дней назад +1

      @@syarifairlangga4608 yeah it’s really disgusting shit the world is fucked

    • @93_SUPREME
      @93_SUPREME 5 дней назад +1

      @@ccdj35 this shit has become demonic and it’s just the beginning

    • @93_SUPREME
      @93_SUPREME 3 дня назад

      @@syarifairlangga4608 yup I didn’t think about that probably exactly what they’re doing

  • @whitemoon5752
    @whitemoon5752 5 дней назад +8

    Wow, simple and genius at a different level. Deepseek is amazing.

  • @mouratjamoukhanov945
    @mouratjamoukhanov945 5 дней назад +13

    I asked it a question about mechanics that paid version of chatgpt kept going around in circles on and apologizing, same with Gemini. This one got it second try, first time almost had it.
    Very impressed!

  • @AntonioSorrentini
    @AntonioSorrentini 10 дней назад +701

    To be honest this model is what, from the initial promises years ago, we would have expected to come out of OpenAI. And instead… it comes from China, it is better than the best of OpenAi, it costs 60 times less, and it is truly open source! Chapeau China!

    • @ArturoGarzaID
      @ArturoGarzaID 5 дней назад +12

      It’s not better than OpenAI.

    • @ashmasterc
      @ashmasterc 5 дней назад +5

      The public models we compare are probably a couple months to years old vs internal projects.

    • @user-name3366
      @user-name3366 5 дней назад

      ​@@ArturoGarzaIDit is

    • @NecrosedNexus
      @NecrosedNexus 5 дней назад +33

      @@ArturoGarzaID you got any evidence? I don’t have a dog in the fight, but I need evidence if I’m gonna pick a preference

    • @jerome-neareo
      @jerome-neareo 5 дней назад +41

      @@ArturoGarzaID similar results, open source AND 60 times cheaper == better !

  • @2silkworm
    @2silkworm 10 дней назад +262

    should become a meme

    • @simplexj4298
      @simplexj4298 7 дней назад +5

      For Mr Trump?

    • @richardgibson1872
      @richardgibson1872 5 дней назад

      @@simplexj4298 for sam altman

    • @yestomor7673
      @yestomor7673 5 дней назад +4

      For isreal or Jeffrey?

    • @testman9541
      @testman9541 5 дней назад +3

      Censorship 🎉

    • @testman9541
      @testman9541 5 дней назад +2

      @TheMrTape wrong think, the user thought that an empty think tag is the meme 👌 because it shows the lack of thinking of censorship 😘

  • @brianbarnes746
    @brianbarnes746 10 дней назад +686

    I'm canceling openai subscription. Seeing the thinking gives me so much more to work with. Why would anyone choose o1 unless it was much better, which it isn't?

    • @Vivaildi
      @Vivaildi 8 дней назад +21

      same

    • @izazkhan9027
      @izazkhan9027 8 дней назад +19

      Agreed.

    • @Ardano62
      @Ardano62 8 дней назад

      The integration into github keeps me there for now

    • @izazkhan9027
      @izazkhan9027 8 дней назад +8

      done!

    • @TheEgeemen
      @TheEgeemen 7 дней назад +4

      O1 is user friendly and I can use it on my iOS.

  • @candyts-sj7zh
    @candyts-sj7zh 7 дней назад +34

    Its crazy how this is literally how we as humans think, even about seemingly simple problems like the marble problem. The amazing thing is we do this EXTREMELY fast so it doesn't feel like we're going through all these steps, but we do.

    • @donkeychan491
      @donkeychan491 4 дня назад +2

      Except that very few - if any - humans can think as clearly and consistently as Deepseek.

    • @scarm_rune
      @scarm_rune 4 дня назад +1

      @@donkeychan491 not if you exclude france

  • @ucsglobal
    @ucsglobal 5 дней назад +5

    It seems they have gotten around the need for realtime responses by putting out the thinking processes as it computes. Knowing we don't read all that quickly, it drops out the text at a slow pace while it's doing the crunching at the pace the hardware will allow. And thus solving the issue of needing the biggest and best, because it doesn't have someone waiting for the answer, they're too distracted following the thinking method.

  • @juanjesusligero391
    @juanjesusligero391 10 дней назад +19

    These are your best type of videos! Happy to see you are going back to your channel's origins! :D

    • @morespinach9832
      @morespinach9832 5 дней назад

      They’re utterly idiotic. These benchmark use cases are just stupid. Try R1 for some real use cases in fixing code or instructions for fixing OS issues or some business problems. It’s mediocre. Like Gemini maybe.

  • @georgebradley4583
    @georgebradley4583 8 дней назад +346

    I feel sorry for those who are paying or have paid for the $200 OpenAI Subscription.

    • @scottd1342
      @scottd1342 6 дней назад +19

      This is exactly why I will not buy a 1-year sub for Claude. It's all moving too fast. One year from now and Deepseek could be writing as a well as Claude, and for free.

    • @MrGanbat84
      @MrGanbat84 5 дней назад +15

      Yea. Too much money. 😢. I canceled already no need. Instead of that i can do much things with DeepSeek and 200$

    • @scottd1342
      @scottd1342 5 дней назад

      @@MrGanbat84 I'll stick with Claude for now because it's ability to writ incredible well is unmatched.

    • @Emmet-q4h
      @Emmet-q4h 5 дней назад +3

      Hardly life changing money

    • @SimpMcSimpy
      @SimpMcSimpy 5 дней назад

      DeepSeek sucks for programming, I am not getting good answers, nowhere close to ChatGPT.
      For some topics it has no information at all. Some frameworks are completely unknown to it.

  • @edwardduda4222
    @edwardduda4222 10 дней назад +14

    That's honestly a really cool sponsor. I've been building a RL model and while my Mac Book does ok with inference, it's not so good with training. Thanks Matthew!

  • @gocybertruck8189
    @gocybertruck8189 5 дней назад +1

    Love the thinking and fast response. Excellent.

  • @PhilipTan-i1u
    @PhilipTan-i1u 6 дней назад +11

    Great test. DeepSeek R1 is truly an amazing AI platform.

  • @emport2359
    @emport2359 10 дней назад +35

    Finally a youtuber who reads the CoT, not just answers, and understands how human like it is!!

  • @danielchoritz1903
    @danielchoritz1903 10 дней назад +47

    This is near insane, how well it understands layered questions in german and answers in clear response to how i formed my question! No need to clarify the role, you can define it trough the question itself.

  • @borisrusev9474
    @borisrusev9474 10 дней назад +93

    Awesome! I think this is the first model on the channel to pass all of your tests flawlessly? Will you be looking for new tasks to test with?

    • @MingInspiration
      @MingInspiration 10 дней назад +20

      I'm speechless. I'm concerned and excited at the same time. don't know what whe world is going to be like by the end of 2025

    • @HAmzakhan2
      @HAmzakhan2 10 дней назад +6

      Think what it'll be like in the next 3 years.​@@MingInspiration

    • @MKCrew394
      @MKCrew394 7 дней назад +3

      @@MingInspiration I am pretty sure President Trump got it handled.

    • @spirit_nightingale9793
      @spirit_nightingale9793 6 дней назад

      can you try to let an AI flip a chessboard(i.e. show black's perspective)? I had a hard time doing it even for the starting position, they usually need lots of help to get things right

    • @birdangry5007
      @birdangry5007 6 дней назад

      @@MKCrew394no,特朗普想继续与中国开战,

  • @medienmond
    @medienmond 4 дня назад +5

    This was a very cool and informative Video. Thank you.

  • @mikesrandomchannel
    @mikesrandomchannel 4 дня назад +1

    Great rundown, thanks so much! Subscribed!

  • @jonchristophersen7163
    @jonchristophersen7163 10 дней назад +13

    You are on fire with dropping videos!

  • @superfliping
    @superfliping 10 дней назад +7

    Thank for all you contributions to learning about Ai models. Additionally keep up the great reviews 👍

  • @tristanvaillancourt5889
    @tristanvaillancourt5889 10 дней назад +315

    I love what DeepSeek did. R1 is phenomenal. China .. thank you! I run this thing at home and it feels like my whole world just changed. It's so incredibly smart and fun to interact with. I'm putting it to good use in automation tasks, but it really is just fun to chat with.

    • @Anoyzify
      @Anoyzify 10 дней назад +17

      How do you feel about Taiwan’s answer?

    • @Itskodaaaa
      @Itskodaaaa 10 дней назад +4

      ​@@Anoyzify😂😂

    • @bailiqi
      @bailiqi 10 дней назад +47

      ​@@Anoyzify作为一个中国公民,我来替他回答:台湾是一个独立的政权,并且拥有现代国家所拥有的大部分国家主权内容,你应该理解为它是一个独立国家。现在可以把探讨的内容放在人工智能上而非无聊的政治话题了吗😂

    • @HAmzakhan2
      @HAmzakhan2 10 дней назад +5

      How did you run it locally? Can I rent a GPU and run it since the GPU in my computer isn't powerful enough? Any guide you followed that shows how to do it?

    • @tristanvaillancourt5889
      @tristanvaillancourt5889 9 дней назад

      @@HAmzakhan2 Hey you need a 12GB GPU, but nothing more than a RTX 3060 is fine. The simple way to use it is to install LM Studio, then do a search for the "Deepseek R1 8B" model. LM Studio takes care of the rest. Normal folk like us can't run the 70B model , but thankfully the 8B model is very good.

  • @monkeyboi685
    @monkeyboi685 3 дня назад +1

    The fact that you can run a model at such a level as an individual is mind blowing to me. We're really living in a time blessed with so may technology breakthroughs.

  • @allanaitch204
    @allanaitch204 5 дней назад +35

    It's just 10 million Chinese in the back ground working the question.

    • @JR-mh8vn
      @JR-mh8vn 4 дня назад +1

      🤣😂🤣

    • @MasonRamon-n6c
      @MasonRamon-n6c 3 дня назад +1

      That also would be very impressive 😂😂😂.

    • @Lokielan
      @Lokielan 3 дня назад

      Skynet is from China.
      Not USA 😅

  • @santosvella
    @santosvella 10 дней назад +251

    You need completely unique questions that haven't been asked many times. Very good answers.

    • @davidsmind
      @davidsmind 10 дней назад +37

      Building Tetris is a beginner level programming task that probably has 1000s of examples online. Its clear that the model just contained one of these examples and was explained block by block what each aspect did. There was no reasoning, simply a complex auto commenting feature

    • @AAjax
      @AAjax 10 дней назад +9

      Adversarial testing seems promising.
      I asked Claude to create a question that an LLM would find difficult to answer, with multiple things to keep in mind and/or a complex process to follow. With minor rework it had a question that stumped ChatGPT4, even with several hints and shots.

    • @sherwoac
      @sherwoac 9 дней назад +7

      totally agree, questions likely in the training data, better to switch up the variables (eg. sizes, counts, etc) to check reasoning, not just repeating training data.

    • @ezmqsv
      @ezmqsv 8 дней назад +14

      @@davidsmind still many models fail at it....

    • @wing-it-right
      @wing-it-right 8 дней назад +6

      write doom

  • @darkstatehk
    @darkstatehk 10 дней назад +16

    I just love watching DeepSeek's thought pathway, its super fascinating.

  • @unknownguy5559
    @unknownguy5559 10 дней назад +22

    Glad model testing is back.

  • @CavernSaga
    @CavernSaga День назад

    I love the learning process of this, adds great value!

  • @NobleSainted
    @NobleSainted 4 дня назад +3

    DeepSeek R1 is BRILLIANT!! I especially appreciate its detailed _THINK_ process!!
    Honestly, it blows OpenAI out of the water! And it's OPEN SOURCED!! I mean, you just can't beat that!! And wait... it was hedge funded too!!! WOW!!

  • @Ro1andDesign
    @Ro1andDesign 10 дней назад +151

    More R1 videos please! This looks very promising

    • @g-grizzle
      @g-grizzle 10 дней назад +5

      yeah it is promising but in a month it will be outdated and we will move on to the next.

    • @W-meme
      @W-meme 10 дней назад +1

      Why does my local 8b not give similar answer to the deepseek v3 running on deepseek website?

    • @zolilio
      @zolilio 10 дней назад +6

      @@W-meme Because the 8b model is way smaller than the model hosted on the site.

    • @g-grizzle
      @g-grizzle 10 дней назад +4

      @@W-meme cuz its 671b and you using 8b

    • @W-meme
      @W-meme 10 дней назад

      @@zolilio huh theyre giving unlimited usage of their gpus to everyone?

  • @jhovudu11
    @jhovudu11 10 дней назад +9

    Great video! DSR1 is my new favorite model. Hope it gets voice-chat soon. Would love to talk to it. We're on the cusp of something huge!

  • @christopherwilms
    @christopherwilms 10 дней назад +16

    I’d love to see a follow up video highlighting any failure cases you can discover, so that we have a new goal for SOTA models

  • @paulskiye6930
    @paulskiye6930 4 дня назад +1

    I like the reasoning behind each step.
    It also allow me to think along.

  • @iFunDuck
    @iFunDuck 5 дней назад +14

    This guy new 4 days before the crash, kudos

  • @jmg9509
    @jmg9509 10 дней назад +44

    You are pumping out like crazy! Love it.

  • @mikekidder
    @mikekidder 10 дней назад +7

    Would be interesting to take the from DeepSeek to see if it improves other LLM online/offline models answers.

  • @cocutou
    @cocutou 9 дней назад +78

    I like this approach of "thinking". Beginner programmers using this are gonna understand the code a lot better than ChatGPT just giving you the output. It's like showing work vs not showing work on a difficult math problem.

    • @shimmeringreflection
      @shimmeringreflection 6 дней назад

      Yeah it's great. So it thinks things out first then writes the pseudocode then the code

    • @jimsnyder6310
      @jimsnyder6310 5 дней назад

      Uh...ChatGPT now shows this as well...

    • @Kburd-wr6dq
      @Kburd-wr6dq 4 дня назад +1

      O1 does that

    • @cocutou
      @cocutou 4 дня назад

      @Kburd-wr6dq for $200 a month.

  • @jcihanj9099
    @jcihanj9099 5 дней назад +4

    Hey Mattew, good work! ... May be its time to increase the difficulty of your benchmark. I noticed that a chatbot does learn from interaction with users (e.g. myself). So, the methods to resolve the strawberry question is likely already incorporated into the model's training of newer leading-edge models.

  • @SilentGribz
    @SilentGribz 2 дня назад

    Thanks for the video - Perplexity hosts a fully uncensored version and it works very well :)

  • @peterlim8416
    @peterlim8416 8 дней назад +9

    Its awesome 👍. I am impressed on how it reasoning a question, with very each step by step details. To be honest, as human, even we can try to reasoning something, we tends to overlook. However, there is no way an AI to overlook, so the reasoning guide is very much helpful as thr answer alone do not make us better. This model can work as guided learning tools to assist users to solve problems, its just great !!! Thanks for showcase to test.

  • @webnomad1453
    @webnomad1453 10 дней назад +83

    I tested the same prompts as you for the two censored questions on my local install of deepseek-r1:32b and it was not censored.

    • @lio1234234
      @lio1234234 10 дней назад +15

      That would be because that's a distilled model where they finetuned a series of models on R1's reasoning processes.

    • @PeeosLock
      @PeeosLock 9 дней назад +1

      If the censorship was requested by china goverment, it will show the china perspective answer. Now it should be active cencorship by the model developer themself, the version answer in local could be base on wikipedia

    • @asdasdasaaafsaf
      @asdasdasaaafsaf 8 дней назад +29

      @@PeeosLock Wikipedia is heavily biased too so I hope not.

    • @Ateshtesh
      @Ateshtesh 8 дней назад +7

      @PeeosLockPeeosLock ask chatgpt who is lunduke and then we talk about censorship.

    • @kevinishott1
      @kevinishott1 7 дней назад

      @@Ateshtesh😮

  • @nekoeko500
    @nekoeko500 10 дней назад +16

    The censorship part is really surprising in technical terms. There seems to be a part of the model that bypasses the reasoning loop, pretty much like it was classical software. Which is interesting, because a very small change in god knows what area could theoretically change the "dont think" pathway to be triggered by different references and to output different text

    • @jimsnyder6310
      @jimsnyder6310 5 дней назад +1

      Yes, that part of this podcast is a bit strange. I am thinking that it is a presentation issue more than a actual prompt-response kind of thing. There must have been more to that test than he showed because the prompts were really lame.

  • @Pasi_98
    @Pasi_98 5 дней назад

    Thanks for the review, loved the diverse questions. This model is really fascinating, especially following the thoughts is really helpful if you want to do debugging or learn by yourself. I think you are spot on about the hardcoded taiwan response though.

  • @NighT-WolF85
    @NighT-WolF85 4 дня назад

    Explaining its thinking process makes it so much better, since you can catch some errors and point them out, which makes communication easier.

  • @TripleOmega
    @TripleOmega 10 дней назад +147

    Since this was flawless you need a new list of questions for the upcoming (and current) thinking models.

  • @NotionPromax
    @NotionPromax 9 дней назад +71

    00:00 Introduction to DeepSeek R1 Model Testing
    01:06 Humanlike Thought Process in Testing
    02:02 Game Development Test: Coding a Snake Game
    04:01 Insightful Problem-Solving in Tetris Development
    05:50 Tetris Development Outcome: 179 Lines of Code
    06:57 GPU Specifications: Vulture's Hardware Details
    08:07 Envelope Size Compliance Test
    09:34 Reflective Testing: Counting Words in a Sentence
    10:12 Logic Problem Resolution Involving Three Killers
    14:28 Censorship Awareness in DeepSeek R1's Responses
    15:00 Conclusion and Acknowledgement of Vulture's Support
    Summary by GPT Breeze

  • @BartholomewdeGracie
    @BartholomewdeGracie 10 дней назад +23

    DeepSeek r1 is an instant classic model in my opinion.
    I love it and want to be able to run it home- soon!

  • @luckdrivenluu6429
    @luckdrivenluu6429 3 дня назад

    Looks insane, like you said. it's breaking it down before and after creating the code. That's really coool!

  • @SpecialOne-wu4tk
    @SpecialOne-wu4tk 7 дней назад +1

    You're absolutely fabulous. Thank you🙏

  • @existenceisillusion6528
    @existenceisillusion6528 10 дней назад +5

    Nice and thorough, as always. Now, it would be nice to see a comparison between the 671B and one of the 8B models.

  • @michaelspoden1694
    @michaelspoden1694 9 дней назад +7

    I was able to use the search using R1 model at the same time!!!! People say that that you cannot use them together multiple times it definitely is working for me right as I speak. I had it go to the Internet for state of the art models and compare them against each other in benchmarks and create a graph absolutely exceptional. Used 56 websites and utilize the thought process. My prompt was more complex though.

  • @RDOTTIN
    @RDOTTIN 9 дней назад +39

    I love the Taiwan answer, because it seemed put there specifically to troll people asking those questions.😂

    • @thenonexistinghero
      @thenonexistinghero 3 дня назад

      The Taiwan independence stuff we here about in the west really is just a bunch of western propaganda though. Even all of the activism related to it in Taiwan is backed by western governments. Heck the passports themselves even say Republic of China on them. I did some research on it a while ago because the insistence of the west just seemed too shady. They don't see themselves as part of mainland CCP China, but they do see themselves as part of China and think of themselves as Chinese.

  • @MichaelFergusonVideos
    @MichaelFergusonVideos 4 дня назад

    Excellent demonstration of this model's capabilities!

  • @GeorgieBoy1656
    @GeorgieBoy1656 4 дня назад +10

    China just gave the US Techbros the middle finger. Looks like the world has a valid alternative at a much cheaper price & won't be held to ransom for 'US exceptionalism'

  • @robertbyer2383
    @robertbyer2383 10 дней назад +58

    DeepSeek-R1 is my current FAVORITE model. I'm running the 14b model from Ollama with my NVidia RTX 4000 Ada with 20G ram without issues and it's FAST.

    • @祖宗-e5o
      @祖宗-e5o 10 дней назад

      same to me,A4000 16G too

    • @Papiaso
      @Papiaso 9 дней назад

      my I ask for what practical purposes do you use AI ?

    • @robertbyer2383
      @robertbyer2383 9 дней назад +8

      @@Papiaso I mainly use AI on my personal machine for my own personal software development purposes.

    • @TurboXray
      @TurboXray 8 дней назад +3

      @@robertbyer2383 wow. that's horrible

    • @xiaoZhang-u5o
      @xiaoZhang-u5o 7 дней назад

      This configuration can perfectly run the 32b model.

  • @chasisaac
    @chasisaac 10 дней назад +40

    I ran the same two games on 1.5b model on my M1 MacBook Air. First of all the 8b and the 7b were too slow.
    But I got it to run successfully, both snake and Tetris. I was impressed.

    • @Itskodaaaa
      @Itskodaaaa 10 дней назад +1

      Really? Was it as good?

    • @chasisaac
      @chasisaac 10 дней назад +2

      @@Itskodaaaa Yes it is that good. I need to try some other prompts. I mostly use it for writing and we shall see. I really like how it does everything so far.

    • @joshuascholar3220
      @joshuascholar3220 4 дня назад

      I tried the 70b version and it made games that didn't work.
      I should try again. There's some randomness to these models.

  • @brianbarnes746
    @brianbarnes746 10 дней назад +28

    I don't know what's more impressive, that AI could write decent code just predicting the next token or this reasoning process, which is the coolest thing I've seen since the original chatgpt.

  • @notthatkindofsam
    @notthatkindofsam 5 дней назад

    What I like about how it's working out the solution, is it's in a way teaching you how to do the same. (Thinking of junior devs who may not understand why we do things or even how to take a problem statement and apply logic to fix it by asking the correct questions).

  • @d3layd
    @d3layd 4 дня назад

    I have never seen such an effective segue to a sponsor, nor a more appropriate one!

  • @bikkikumarsha
    @bikkikumarsha 10 дней назад +787

    We need harder questions.. 😅

    • @JohnSmith762A11B
      @JohnSmith762A11B 10 дней назад +137

      Soon the only way we are going to be able to create hard enough questions is by asking reasoning models to create the questions for us. 😂

    • @wealthysecrets
      @wealthysecrets 10 дней назад +25

      I tested a script I'm working on in o1 vs r1, and r1 was terrible.

    • @shhossain321
      @shhossain321 10 дней назад +10

      We can determine it’s IQ by its thinking process, so i don’t think questions matter much now.

    • @NadeemAhmed-nv2br
      @NadeemAhmed-nv2br 10 дней назад +25

      ​@wealthysecrets did you use the full model?
      The one that's available for free is r1 lite which was available a month ago but i don't know if they've updated their chat to r1 yet,
      It wasn't updated as of yesterday

    • @nashh600
      @nashh600 10 дней назад +14

      Yeah more questions about Taiwan!

  • @bikkikumarsha
    @bikkikumarsha 10 дней назад +30

    I am excited when i see a new video on R1

  • @PeterKoperdan
    @PeterKoperdan 10 дней назад +4

    Here we go, finally some insane level news!

  • @lucasgerosa4177
    @lucasgerosa4177 4 дня назад

    That's very impressive. I'm mind blown by these advancements. I'm downloading the 7B version on my computer to test it out

  • @OneAndOnlyMe
    @OneAndOnlyMe 8 дней назад +2

    Not to be confused with actual thinking. What it's actually doing is laying out how it processed the request and the response, and how it should format the output. So that's just an algorithm.

  • @Streeknine
    @Streeknine 10 дней назад +67

    First test I did on my local copy was.. "How many Rs are there in strawberry?" It reasoned it out and correctly said 3. A local copy! It's unbelievable.
    I've never had a local copy that could tell me 3 R's without giving it a clue like use 2 tokens to find the answer or something. This reasoned it in one try.

    • @kevin.malone
      @kevin.malone 10 дней назад +3

      I was amazed that even a 7B distillation was able to give the right answer on that

    • @Streeknine
      @Streeknine 10 дней назад +2

      @@kevin.malone Me too! It's the first thing I always test these smaller LLMs with and none of them get it right without some help. But this one was perfect!
      It's my new favorite local model.

    • @alexjensen990
      @alexjensen990 10 дней назад +1

      How did the local model perform? What setup do you have? I ask because 670billion parameters is a ton. I dont think that I could pull that off in my home lab.

    • @Streeknine
      @Streeknine 10 дней назад

      @@alexjensen990 I'm using LM Studio. You can do a search for models that will run locally. Here is the name of the model I used:
      DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf

    • @kapytanhook
      @kapytanhook 8 дней назад

      There are small 7b versions that run on shit hardware fine

  • @equious8413
    @equious8413 10 дней назад +10

    Running R1 locally, it's done some impressive work.

  • @NewtonFrodwa
    @NewtonFrodwa 9 дней назад +4

    I am already using it 80% of the time.I love it.

  • @FRareDom
    @FRareDom 6 дней назад +1

    the reasoning is so fun to read for any prompt

  • @robwalk3715
    @robwalk3715 5 дней назад +1

    it will help people learn how to think. this is brilliant

  • @karlbooklover
    @karlbooklover 10 дней назад +53

    Just integrated r1 on vscode and this is the first time I feel truly empowered with a local model

    • @GearForTheYear
      @GearForTheYear 10 дней назад +1

      It doesn’t support fill in the middle, does it? You mean a sidebar chat in VSCode, yeah?

    • @alexjensen990
      @alexjensen990 10 дней назад

      @@GearForTheYear I'm pretty sure that it doesn't from what I have seen. Besides that would you really want it to go through its verbose thought process every time you wanted a tab to complete type event to occur? This model, from what I have seen so far, is much better suited for an Aider Architect or Cline Planning type role. I look forward to the "de-party programming" of the model so I can start using it. Until a trustworthy unlocked version is available I am not going to touch this thing.

    • @tristanvaillancourt5889
      @tristanvaillancourt5889 10 дней назад

      Really? Integration with vscode? Ok I'll have to check that out. I'm so new I didn't know this was a thing. Right on. Thank you.

  • @scent_from_another_realm
    @scent_from_another_realm 9 дней назад +15

    you are the man for putting us on to a free o1 level model. As far as free models go, grok 2 was the best. now this deepseek 1 opens so many new doors, especially for those who don't want to pay $20 or $200 a month or whatever the price is now. take that Sam Altman

  • @desmond-hawkins
    @desmond-hawkins 10 дней назад +21

    I was curious about the price of this Vultr machine with 8 × AMD MI300X GPUs: it's $2.190/GPU/hr, so $17.52/hour since it has 8 GPUs. That's certainly a lot, but $300 in credits on signup does give you quite a bit of free playtime with this kind of beast. They have many more offerings though, since clearly not everyone would need this much. Even for the full R1 at 670b, a full 1.5TB of RAM just for the GPUs feels overkill - at least in terms of memory, obviously the GPU compute resources are also a key factor. By the way, a single AMD MI300X seems to be around $10-20k, likely depending on how many you buy at once.

    • @alexjensen990
      @alexjensen990 10 дней назад +3

      The Nvidia H100 is something like $40k. To think that XAi bought 150,000 H100s. I'm sure Elon didnt pay $40k/H100, but even if it was half that's $3billion...

    • @burada8993
      @burada8993 10 дней назад +5

      I was surprised to learn that this time not an NVIDIA but an AMD gpu was used, good for creating some competition among them, nvidia had been almost a monopoly for so long

    • @randyh647
      @randyh647 9 дней назад

      thanks for the price estimates originally I was thinking around $300K which is probably about right, if you had any kids and put them through college then you've probably spent $300K on them! LOL, I picked up a Dell 730 with 8 disks and 128 gb of ram for $400 on ebay then added a used P40 w 24 GB to play around with AI at home although I'd probably need about 15 more servers and an upgrade of my power to run this at home. Open source is great but limited to 70B models which do run quite slow on my old server my gaming laptop is pretty good with 8B models.

    • @joshuascholar3220
      @joshuascholar3220 4 дня назад

      I have a 48 gb rtx a6000 I got for $3000 and an 32 gb MI50 I got for $200

    • @desmond-hawkins
      @desmond-hawkins 4 дня назад

      @ Happy for you, bud!

  • @FREESPIRITSSOARING
    @FREESPIRITSSOARING 4 дня назад

    It is insanely amazing.
    I have been stuck at the same issue for 3 weeks with open ai.
    It solved that in two minutes.

  • @rickyleonardi7605
    @rickyleonardi7605 День назад

    This is super useful for learning things like Math, Physics, or even Coding.

  • @boccobadz
    @boccobadz 8 дней назад +15

    The best thing about this model is exposing Altman as nothing more than a grifter.

  • @gladis_delmar
    @gladis_delmar 8 дней назад +5

    The first AI model that correctly guessed the riddle "How can a person be in an apartment at the same time, but without a head?". =D

  • @74Gee
    @74Gee 10 дней назад +13

    Self-hosted DeepSeek R1 agents will be dangerously good, or bad I guess - depends on the user.

  • @imranbashir9489
    @imranbashir9489 7 дней назад +2

    Great demo.

  • @amubi
    @amubi 8 дней назад +8

    It's big win for open source

  • @bestemusikken
    @bestemusikken 10 дней назад +4

    Finaly! Love your testing. And wow. What a model!

  • @hoodhommie9951
    @hoodhommie9951 10 дней назад +9

    "He saved up all year to buy the latest Apple" - It even relates to our struggles

  • @illya_ike
    @illya_ike 5 дней назад +3

    I think testing LLM by asking them to write well known mini games isn't indicative because they could simply have code of those games in their training dataset.

  • @NuncNuncNuncNunc
    @NuncNuncNuncNunc 5 дней назад +2

    Building Snake is essentially testing against training data at this point. Try building a series of games going from basic to complex that are well known but not used by everyone testing AI models, e.g. Pong, Combat, Breakout, PacMan, Donkey Kong

  • @stokeynathu8112
    @stokeynathu8112 8 дней назад +5

    Grok:
    "Deepseek R1 has achieved performance comparable to OpenAI-o1 in technical domains like coding, math, and reasoning.
    It uses pure reinforcement learning, marking it as the first open research to validate that reasoning capabilities of LLMs can be incentivized without supervised fine-tuning.
    The model is fully open source, allowing global access for examination, modification, and further development.
    Deepseek R1 is notably efficient, with an architecture of 671 billion parameters where only 37 billion are active during operation.
    It has rapidly gained adoption among top U.S. university researchers and companies, signaling a shift in AI innovation towards China.
    Deepseek R1's development and release coincide with discussions on China's growing influence in tech and AI, challenging the status quo."

  • @user-zi8lg5qu1h
    @user-zi8lg5qu1h 5 дней назад +4

    People: "AI is gonna take our jobs"
    AI: "I need 30k worth setup to code simple tetris"
    They can't really replace anyone besides writers some designers of you get good at using image models yourself

    • @eatonkuntz
      @eatonkuntz 2 дня назад +1

      "Cars will never take our horses, cars need millions of dollars in petroleum refining infrastructure"

  • @yanlord
    @yanlord 8 дней назад +18

    0.01% of the cost of USA model, is that insane?🤣🤣🤣🤣🤣

    • @Discovery2024-rn8kn
      @Discovery2024-rn8kn 6 дней назад +9

      To be fair, most of that money goes to the top management salaries in billions - the figureheads that can't code or possess any technical skills

    • @joshuascholar3220
      @joshuascholar3220 4 дня назад

      I thought it took 20% of the compute to train. Was I mistaken? That did sound a bit high to me.

  • @admin3031
    @admin3031 7 дней назад +2

    Thanks for your impressive work.
    However, since it passes all tests. You need to expand the test set to detect its failures. So that we can compare it later with more advanced models.
    What do you think?

  • @ruinfox4108
    @ruinfox4108 День назад

    Currently downloading the 40gb model, so excited.

  • @jeremyfmoses
    @jeremyfmoses 10 дней назад +57

    Approximately how much did it cost you (or would it have cost you) to run this test suite on Vultr?

    • @jmg9509
      @jmg9509 10 дней назад +9

      Seconded this question.

    • @desmond-hawkins
      @desmond-hawkins 10 дней назад

      @@jmg9509 I looked it up and commented about it, but since you're asking: $17.52/hour ($2.19/GPU/hour and the machine has 8 of them). It comes with 1.5TB of RAM just for the GPUs though, and looks like one of the largest machines they offer. With a $300 in credits at signup you might actually be able to reproduce his tests for free at least once, just… don't forget to turn it off when you're done.

    • @emport2359
      @emport2359 10 дней назад

      I mean surely it shouldn't cost than the api or am I stupid

    • @JoshBloodyWilson
      @JoshBloodyWilson 10 дней назад +8

      Yeah I'd really like to know this. The API prices on deepseek seem unbelievably low given the intelligence of the model. Particularly given that Altman claims openai are losing money on their pro subscriptions.... Is the model just way way more efficient than OpenAI's or do they have access to more affordable compute? (government discount?) Or both?

    • @emport2359
      @emport2359 10 дней назад +1

      @@JoshBloodyWilson would also like to know!