This is very underrated, seeing the thought process is great. Having to go through these as a human is annoying but would love to see if running it on its own thought chains could detect issues.
@@tengdayz2 what do you mean? Is there a command or something? I've used ollama before but don't know how to add the response as an input with a message like "critique this response and try harder" or something
@@patruffyou can use the ollama run command in the shell where it's installed to pull the model. Then use the model to answer this question :). I prefer to do my own digging, but encouraging you to satisfy your own curiosity.
@tengdayz2 okay I'm sort of confused but I'm guessing you mean just conversing with the model, I do that, it's good, but I'm looking for scripting out the capture of thought chains so I can use it for fine-tuning later
absolutely love the reasoning. I often have a hard time thinking of varioous edge cases and this eliminates that and allows me to think more creatively.
Used it far difficult earnest work and I also found the internal monologue to be every bit as helpful as the output response. It opened up new conversation opportunities that brought out completely new responses that I otherwise couldn't have generated without that information. Seeing the monologue is way way more important than I thought -- maybe because I didn't get to see it from o1.
That is correct COT is nice, but it's not always desired. You can pump up a bad model with this technique. Now think of having a proper good model from OpenAI/Anthropic and slap COT (Chain of thought) to that model. That would surpass deepseek by alot. It's just a post processing technique. Something you want to be able to turn on and off. It's not always desired.
I just tried it locally and asked it a view questions from my personal field of expertise. It provided me in real time with answers on the level of PhD-grade expert in this field. Also, it can not only explain things plausibly, but it is able to apply this knowledge to solve problems in a meaningful way. I am very impressed and, frankly - scared. So now that I have a complete set of PhD-level experts of all fields on my graphics card, I just need to make good use of the team. Ressources on par with an enterprise-level talent pool at hand. I struggle to fathom what this implies.
This tech revolution is just too fundamental for society to integrate into our daily lives. It might take decades to accelerate productivity in all fields.
What level of education do you need to ask AI the question and what level is required to understand AIs answer? Using deepseek I wouldn’t be able to replace you. I wouldn’t know what prompt to create and I wouldn’t understand the answer.
I just used it to assist with a statistical signal processing coursework to estimate target trajectory and velocity from millimeter wave radar data. It's not a easy assignment and it went wrong, but it provided ideas and steps to solve the assignment so I could easily correct it, which is much better than other LLM models. This is a common problem with LLM models, sometimes the output is worthless but the 'thought process' is enlightening.
The founder of this is a quant managing hundred of billions on the CHinese stock market, he has less than 100 people and make this model as an interests with 5 million
I asked to DeepSeek (using the web chat) some detailed and hard to grasp concepts about Roger Penrose's Conformal Cyclic Cosmology theory. It didn't conflict with itself, it didn't give bullshitty nonsense answers. And it adressed all the points correctly, I am mindblown.
You know what is even more cool? That you can run distilled R1-32b model on medium grade personal PC and get this Tetris game done locally and also reasoning questions answered. This is some crazy shit when you compare it to what we could do a ~year ago with local models. I ran it today on i-13600K with 32Gb Ram and RTX-4070 Super (12Gb VRAM) - and damn I know I had stunning 5 tok/sec... so some tasks could take time. Yet it's able to complete the tasks you gave here locally on such medicore machine. We're cooked man. Like holy cow cooked.
The 8b on my RTX 3060 with a really old i7 was pumping out 45 tokens/s. It's not 32b but based on the published performance graphs, the 8b is no slouch. 5 tok/s with a 32b on a home PC is still pretty good. I'm in love with R1.
I did it with an i5-8500 (48 GB RAM) and a 3060 (12 GB VRAM), using the 70B parameter model. It was more like two tokens a second, with a chain of thought latency of a minute or two before each answer. But yes, all of this really does run on potatoes. That's why I think embargoes just make it more expensive for China to develop these models without actually limiting what they can do. They just have to use more power, and maybe twice the time. No single answer will be fast, but that is offset by running a ton of operations in parallel. OpenAI is who is really cooked. Now they have to know that whatever they release, it will only be bleeding edge for three months before China replicates it and open sources it. This means the whole business model for OpenAI is non-viable. The time window to recoup return on investment is just not there.
Yeah i feel pretty much like Im back in 90s and Im launching some of the first primitive 3D graphics games on my PC and im shitting myself from hype while looking at Lara Croft triangle boobs with 6 FPS, honestly thinking its quite good performance, lol. Now, when I think on how much of a leap we took in terms of graphics, if same (at least) happens to AI in next years…
I asked it a question about mechanics that paid version of chatgpt kept going around in circles on and apologizing, same with Gemini. This one got it second try, first time almost had it. Very impressed!
To be honest this model is what, from the initial promises years ago, we would have expected to come out of OpenAI. And instead… it comes from China, it is better than the best of OpenAi, it costs 60 times less, and it is truly open source! Chapeau China!
I'm canceling openai subscription. Seeing the thinking gives me so much more to work with. Why would anyone choose o1 unless it was much better, which it isn't?
Its crazy how this is literally how we as humans think, even about seemingly simple problems like the marble problem. The amazing thing is we do this EXTREMELY fast so it doesn't feel like we're going through all these steps, but we do.
It seems they have gotten around the need for realtime responses by putting out the thinking processes as it computes. Knowing we don't read all that quickly, it drops out the text at a slow pace while it's doing the crunching at the pace the hardware will allow. And thus solving the issue of needing the biggest and best, because it doesn't have someone waiting for the answer, they're too distracted following the thinking method.
They’re utterly idiotic. These benchmark use cases are just stupid. Try R1 for some real use cases in fixing code or instructions for fixing OS issues or some business problems. It’s mediocre. Like Gemini maybe.
This is exactly why I will not buy a 1-year sub for Claude. It's all moving too fast. One year from now and Deepseek could be writing as a well as Claude, and for free.
DeepSeek sucks for programming, I am not getting good answers, nowhere close to ChatGPT. For some topics it has no information at all. Some frameworks are completely unknown to it.
That's honestly a really cool sponsor. I've been building a RL model and while my Mac Book does ok with inference, it's not so good with training. Thanks Matthew!
This is near insane, how well it understands layered questions in german and answers in clear response to how i formed my question! No need to clarify the role, you can define it trough the question itself.
can you try to let an AI flip a chessboard(i.e. show black's perspective)? I had a hard time doing it even for the starting position, they usually need lots of help to get things right
I love what DeepSeek did. R1 is phenomenal. China .. thank you! I run this thing at home and it feels like my whole world just changed. It's so incredibly smart and fun to interact with. I'm putting it to good use in automation tasks, but it really is just fun to chat with.
How did you run it locally? Can I rent a GPU and run it since the GPU in my computer isn't powerful enough? Any guide you followed that shows how to do it?
@@HAmzakhan2 Hey you need a 12GB GPU, but nothing more than a RTX 3060 is fine. The simple way to use it is to install LM Studio, then do a search for the "Deepseek R1 8B" model. LM Studio takes care of the rest. Normal folk like us can't run the 70B model , but thankfully the 8B model is very good.
The fact that you can run a model at such a level as an individual is mind blowing to me. We're really living in a time blessed with so may technology breakthroughs.
Building Tetris is a beginner level programming task that probably has 1000s of examples online. Its clear that the model just contained one of these examples and was explained block by block what each aspect did. There was no reasoning, simply a complex auto commenting feature
Adversarial testing seems promising. I asked Claude to create a question that an LLM would find difficult to answer, with multiple things to keep in mind and/or a complex process to follow. With minor rework it had a question that stumped ChatGPT4, even with several hints and shots.
totally agree, questions likely in the training data, better to switch up the variables (eg. sizes, counts, etc) to check reasoning, not just repeating training data.
DeepSeek R1 is BRILLIANT!! I especially appreciate its detailed _THINK_ process!! Honestly, it blows OpenAI out of the water! And it's OPEN SOURCED!! I mean, you just can't beat that!! And wait... it was hedge funded too!!! WOW!!
I like this approach of "thinking". Beginner programmers using this are gonna understand the code a lot better than ChatGPT just giving you the output. It's like showing work vs not showing work on a difficult math problem.
Hey Mattew, good work! ... May be its time to increase the difficulty of your benchmark. I noticed that a chatbot does learn from interaction with users (e.g. myself). So, the methods to resolve the strawberry question is likely already incorporated into the model's training of newer leading-edge models.
Its awesome 👍. I am impressed on how it reasoning a question, with very each step by step details. To be honest, as human, even we can try to reasoning something, we tends to overlook. However, there is no way an AI to overlook, so the reasoning guide is very much helpful as thr answer alone do not make us better. This model can work as guided learning tools to assist users to solve problems, its just great !!! Thanks for showcase to test.
If the censorship was requested by china goverment, it will show the china perspective answer. Now it should be active cencorship by the model developer themself, the version answer in local could be base on wikipedia
The censorship part is really surprising in technical terms. There seems to be a part of the model that bypasses the reasoning loop, pretty much like it was classical software. Which is interesting, because a very small change in god knows what area could theoretically change the "dont think" pathway to be triggered by different references and to output different text
Yes, that part of this podcast is a bit strange. I am thinking that it is a presentation issue more than a actual prompt-response kind of thing. There must have been more to that test than he showed because the prompts were really lame.
Thanks for the review, loved the diverse questions. This model is really fascinating, especially following the thoughts is really helpful if you want to do debugging or learn by yourself. I think you are spot on about the hardcoded taiwan response though.
00:00 Introduction to DeepSeek R1 Model Testing 01:06 Humanlike Thought Process in Testing 02:02 Game Development Test: Coding a Snake Game 04:01 Insightful Problem-Solving in Tetris Development 05:50 Tetris Development Outcome: 179 Lines of Code 06:57 GPU Specifications: Vulture's Hardware Details 08:07 Envelope Size Compliance Test 09:34 Reflective Testing: Counting Words in a Sentence 10:12 Logic Problem Resolution Involving Three Killers 14:28 Censorship Awareness in DeepSeek R1's Responses 15:00 Conclusion and Acknowledgement of Vulture's Support Summary by GPT Breeze
I was able to use the search using R1 model at the same time!!!! People say that that you cannot use them together multiple times it definitely is working for me right as I speak. I had it go to the Internet for state of the art models and compare them against each other in benchmarks and create a graph absolutely exceptional. Used 56 websites and utilize the thought process. My prompt was more complex though.
The Taiwan independence stuff we here about in the west really is just a bunch of western propaganda though. Even all of the activism related to it in Taiwan is backed by western governments. Heck the passports themselves even say Republic of China on them. I did some research on it a while ago because the insistence of the west just seemed too shady. They don't see themselves as part of mainland CCP China, but they do see themselves as part of China and think of themselves as Chinese.
China just gave the US Techbros the middle finger. Looks like the world has a valid alternative at a much cheaper price & won't be held to ransom for 'US exceptionalism'
I ran the same two games on 1.5b model on my M1 MacBook Air. First of all the 8b and the 7b were too slow. But I got it to run successfully, both snake and Tetris. I was impressed.
@@Itskodaaaa Yes it is that good. I need to try some other prompts. I mostly use it for writing and we shall see. I really like how it does everything so far.
I don't know what's more impressive, that AI could write decent code just predicting the next token or this reasoning process, which is the coolest thing I've seen since the original chatgpt.
What I like about how it's working out the solution, is it's in a way teaching you how to do the same. (Thinking of junior devs who may not understand why we do things or even how to take a problem statement and apply logic to fix it by asking the correct questions).
@wealthysecrets did you use the full model? The one that's available for free is r1 lite which was available a month ago but i don't know if they've updated their chat to r1 yet, It wasn't updated as of yesterday
Not to be confused with actual thinking. What it's actually doing is laying out how it processed the request and the response, and how it should format the output. So that's just an algorithm.
First test I did on my local copy was.. "How many Rs are there in strawberry?" It reasoned it out and correctly said 3. A local copy! It's unbelievable. I've never had a local copy that could tell me 3 R's without giving it a clue like use 2 tokens to find the answer or something. This reasoned it in one try.
@@kevin.malone Me too! It's the first thing I always test these smaller LLMs with and none of them get it right without some help. But this one was perfect! It's my new favorite local model.
How did the local model perform? What setup do you have? I ask because 670billion parameters is a ton. I dont think that I could pull that off in my home lab.
@@alexjensen990 I'm using LM Studio. You can do a search for models that will run locally. Here is the name of the model I used: DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
@@GearForTheYear I'm pretty sure that it doesn't from what I have seen. Besides that would you really want it to go through its verbose thought process every time you wanted a tab to complete type event to occur? This model, from what I have seen so far, is much better suited for an Aider Architect or Cline Planning type role. I look forward to the "de-party programming" of the model so I can start using it. Until a trustworthy unlocked version is available I am not going to touch this thing.
you are the man for putting us on to a free o1 level model. As far as free models go, grok 2 was the best. now this deepseek 1 opens so many new doors, especially for those who don't want to pay $20 or $200 a month or whatever the price is now. take that Sam Altman
I was curious about the price of this Vultr machine with 8 × AMD MI300X GPUs: it's $2.190/GPU/hr, so $17.52/hour since it has 8 GPUs. That's certainly a lot, but $300 in credits on signup does give you quite a bit of free playtime with this kind of beast. They have many more offerings though, since clearly not everyone would need this much. Even for the full R1 at 670b, a full 1.5TB of RAM just for the GPUs feels overkill - at least in terms of memory, obviously the GPU compute resources are also a key factor. By the way, a single AMD MI300X seems to be around $10-20k, likely depending on how many you buy at once.
The Nvidia H100 is something like $40k. To think that XAi bought 150,000 H100s. I'm sure Elon didnt pay $40k/H100, but even if it was half that's $3billion...
I was surprised to learn that this time not an NVIDIA but an AMD gpu was used, good for creating some competition among them, nvidia had been almost a monopoly for so long
thanks for the price estimates originally I was thinking around $300K which is probably about right, if you had any kids and put them through college then you've probably spent $300K on them! LOL, I picked up a Dell 730 with 8 disks and 128 gb of ram for $400 on ebay then added a used P40 w 24 GB to play around with AI at home although I'd probably need about 15 more servers and an upgrade of my power to run this at home. Open source is great but limited to 70B models which do run quite slow on my old server my gaming laptop is pretty good with 8B models.
I think testing LLM by asking them to write well known mini games isn't indicative because they could simply have code of those games in their training dataset.
Building Snake is essentially testing against training data at this point. Try building a series of games going from basic to complex that are well known but not used by everyone testing AI models, e.g. Pong, Combat, Breakout, PacMan, Donkey Kong
Grok: "Deepseek R1 has achieved performance comparable to OpenAI-o1 in technical domains like coding, math, and reasoning. It uses pure reinforcement learning, marking it as the first open research to validate that reasoning capabilities of LLMs can be incentivized without supervised fine-tuning. The model is fully open source, allowing global access for examination, modification, and further development. Deepseek R1 is notably efficient, with an architecture of 671 billion parameters where only 37 billion are active during operation. It has rapidly gained adoption among top U.S. university researchers and companies, signaling a shift in AI innovation towards China. Deepseek R1's development and release coincide with discussions on China's growing influence in tech and AI, challenging the status quo."
People: "AI is gonna take our jobs" AI: "I need 30k worth setup to code simple tetris" They can't really replace anyone besides writers some designers of you get good at using image models yourself
Thanks for your impressive work. However, since it passes all tests. You need to expand the test set to detect its failures. So that we can compare it later with more advanced models. What do you think?
@@jmg9509 I looked it up and commented about it, but since you're asking: $17.52/hour ($2.19/GPU/hour and the machine has 8 of them). It comes with 1.5TB of RAM just for the GPUs though, and looks like one of the largest machines they offer. With a $300 in credits at signup you might actually be able to reproduce his tests for free at least once, just… don't forget to turn it off when you're done.
Yeah I'd really like to know this. The API prices on deepseek seem unbelievably low given the intelligence of the model. Particularly given that Altman claims openai are losing money on their pro subscriptions.... Is the model just way way more efficient than OpenAI's or do they have access to more affordable compute? (government discount?) Or both?
An open source AI took OpenAI’s job. That’s poetic justice.
unfortunately it's Chinese and has censorship re Chinese political issues
An open AI overtook the most famous closed AI company 'OpenAI'😂
From a dictator's coutry with zero privacy and broken patent laws. 😂
@@brucelin8950 copehard in your yoo essay dreams. You are still at stage 1. 4 more to go. 🤣
OpenAI blocked China to access it. Deepseek made its model / source code open to all. What an irony !
It actually fills in the thinking gaps allowing you follow along and learn with it. That is super cool
This is very underrated, seeing the thought process is great. Having to go through these as a human is annoying but would love to see if running it on its own thought chains could detect issues.
@@patruff ollama has it
@@tengdayz2 what do you mean? Is there a command or something? I've used ollama before but don't know how to add the response as an input with a message like "critique this response and try harder" or something
@@patruffyou can use the ollama run command in the shell where it's installed to pull the model. Then use the model to answer this question :). I prefer to do my own digging, but encouraging you to satisfy your own curiosity.
@tengdayz2 okay I'm sort of confused but I'm guessing you mean just conversing with the model, I do that, it's good, but I'm looking for scripting out the capture of thought chains so I can use it for fine-tuning later
I like the reasoning more than the actual response, totally fascinating
absolutely love the reasoning. I often have a hard time thinking of varioous edge cases and this eliminates that and allows me to think more creatively.
Used it far difficult earnest work and I also found the internal monologue to be every bit as helpful as the output response. It opened up new conversation opportunities that brought out completely new responses that I otherwise couldn't have generated without that information. Seeing the monologue is way way more important than I thought -- maybe because I didn't get to see it from o1.
Well, studying reasoning is WAY MORE USEFUL but so many lazy humans out there
@Fireneedsair good point
Have you all looked up what they did with the moth brain?
The addition of the reasoning process is so SO much more valuable to the user than just giving a response only, this shit's crazy
I agree!
That is correct COT is nice, but it's not always desired. You can pump up a bad model with this technique. Now think of having a proper good model from OpenAI/Anthropic and slap COT (Chain of thought) to that model. That would surpass deepseek by alot. It's just a post processing technique. Something you want to be able to turn on and off. It's not always desired.
@@HermanWillems Like...why read the book when you can judge its cover?
I just tried it locally and asked it a view questions from my personal field of expertise. It provided me in real time with answers on the level of PhD-grade expert in this field. Also, it can not only explain things plausibly, but it is able to apply this knowledge to solve problems in a meaningful way. I am very impressed and, frankly - scared.
So now that I have a complete set of PhD-level experts of all fields on my graphics card, I just need to make good use of the team. Ressources on par with an enterprise-level talent pool at hand. I struggle to fathom what this implies.
This tech revolution is just too fundamental for society to integrate into our daily lives. It might take decades to accelerate productivity in all fields.
It's crazy how u have that power in ur hand, lol
It blew my mind!!
Wasn’t it sooooo human like?
What level of education do you need to ask AI the question and what level is required to understand AIs answer? Using deepseek I wouldn’t be able to replace you. I wouldn’t know what prompt to create and I wouldn’t understand the answer.
following the chain of thought is such a satisfying way of learning
This is a competition between the Chinese in the US and the Chinese in China.
😂
👍😂👍
No wonder Scale AI ceo lied. 😅
hahahahahahah
No don't say that 😂
I just used it to assist with a statistical signal processing coursework to estimate target trajectory and velocity from millimeter wave radar data.
It's not a easy assignment and it went wrong, but it provided ideas and steps to solve the assignment so I could easily correct it, which is much better than other LLM models.
This is a common problem with LLM models, sometimes the output is worthless but the 'thought process' is enlightening.
@Akalin123 love this comment, and had a similar experience in my technical domain.
and This is a side product of a Chinese hedgefund, they do it for fun🤣🤣🤣🤣🤣
Huawei is gonna insta buy them, lol.
@@Mandom007 deepseek正在向使用华为GPU转移,最近的测试结果是性能比H100低5%,但价格便宜75%
there is western hedge funds that could easily do the same just for fun, but they dont. They just horde the billions to to some vault instead.
The founder of this is a quant managing hundred of billions on the CHinese stock market, he has less than 100 people and make this model as an interests with 5 million
@@yudogcome5901 holy cow, that's insane cost saving, and now Huawei will rise again, what a time !!!
I asked to DeepSeek (using the web chat) some detailed and hard to grasp concepts about Roger Penrose's Conformal Cyclic Cosmology theory. It didn't conflict with itself, it didn't give bullshitty nonsense answers. And it adressed all the points correctly, I am mindblown.
Ask it about the Chinese government.
@@StuggleIsSurrealwawa is barking
@@StuggleIsSurreal Why are you always obsessed with grand narratives?
@@StuggleIsSurrealmedia slave
@@StuggleIsSurreal Eating too many sour grapes, thats so sad。。。haha
If this is not "innovation" by China, IDK what is! Well done! Credit where credit is desrved!
Pretty staggering to imagine what they might be able to do if they were allowed to import modern GPUs.
@ you woul be living in another dimension if you really think they don't have GPUs sold under the table.
@@hqcart1his point is still valid, imagine what they could do if they were allowed to import them
@@JohnSmith762A11Bi remember when i went to art school there is this quote: the greatest enemy of art is the lack of limitations😂
@@hqcart1 Of course they do, but not in the quantities the big US frontier labs have them.
You know what is even more cool? That you can run distilled R1-32b model on medium grade personal PC and get this Tetris game done locally and also reasoning questions answered. This is some crazy shit when you compare it to what we could do a ~year ago with local models. I ran it today on i-13600K with 32Gb Ram and RTX-4070 Super (12Gb VRAM) - and damn I know I had stunning 5 tok/sec... so some tasks could take time. Yet it's able to complete the tasks you gave here locally on such medicore machine.
We're cooked man. Like holy cow cooked.
The 8b on my RTX 3060 with a really old i7 was pumping out 45 tokens/s. It's not 32b but based on the published performance graphs, the 8b is no slouch. 5 tok/s with a 32b on a home PC is still pretty good. I'm in love with R1.
I did it with an i5-8500 (48 GB RAM) and a 3060 (12 GB VRAM), using the 70B parameter model. It was more like two tokens a second, with a chain of thought latency of a minute or two before each answer. But yes, all of this really does run on potatoes. That's why I think embargoes just make it more expensive for China to develop these models without actually limiting what they can do. They just have to use more power, and maybe twice the time. No single answer will be fast, but that is offset by running a ton of operations in parallel.
OpenAI is who is really cooked. Now they have to know that whatever they release, it will only be bleeding edge for three months before China replicates it and open sources it. This means the whole business model for OpenAI is non-viable. The time window to recoup return on investment is just not there.
Yes this is what i've been spending most of my time testing
LM STUDIO I have 16gb VRam not working on AMD & 128gb ram
Yeah i feel pretty much like Im back in 90s and Im launching some of the first primitive 3D graphics games on my PC and im shitting myself from hype while looking at Lara Croft triangle boobs with 6 FPS, honestly thinking its quite good performance, lol.
Now, when I think on how much of a leap we took in terms of graphics, if same (at least) happens to AI in next years…
I love how accurate and human like the thinking process is.
They are using real human brains and growing them in lab for these
@@93_SUPREME yeah they can use the prisoner brain, i mean how can they be this cheap lmao
@@syarifairlangga4608 yeah it’s really disgusting shit the world is fucked
@@ccdj35 this shit has become demonic and it’s just the beginning
@@syarifairlangga4608 yup I didn’t think about that probably exactly what they’re doing
Wow, simple and genius at a different level. Deepseek is amazing.
I asked it a question about mechanics that paid version of chatgpt kept going around in circles on and apologizing, same with Gemini. This one got it second try, first time almost had it.
Very impressed!
To be honest this model is what, from the initial promises years ago, we would have expected to come out of OpenAI. And instead… it comes from China, it is better than the best of OpenAi, it costs 60 times less, and it is truly open source! Chapeau China!
It’s not better than OpenAI.
The public models we compare are probably a couple months to years old vs internal projects.
@@ArturoGarzaIDit is
@@ArturoGarzaID you got any evidence? I don’t have a dog in the fight, but I need evidence if I’m gonna pick a preference
@@ArturoGarzaID similar results, open source AND 60 times cheaper == better !
should become a meme
For Mr Trump?
@@simplexj4298 for sam altman
For isreal or Jeffrey?
Censorship 🎉
@TheMrTape wrong think, the user thought that an empty think tag is the meme 👌 because it shows the lack of thinking of censorship 😘
I'm canceling openai subscription. Seeing the thinking gives me so much more to work with. Why would anyone choose o1 unless it was much better, which it isn't?
same
Agreed.
The integration into github keeps me there for now
done!
O1 is user friendly and I can use it on my iOS.
Its crazy how this is literally how we as humans think, even about seemingly simple problems like the marble problem. The amazing thing is we do this EXTREMELY fast so it doesn't feel like we're going through all these steps, but we do.
Except that very few - if any - humans can think as clearly and consistently as Deepseek.
@@donkeychan491 not if you exclude france
It seems they have gotten around the need for realtime responses by putting out the thinking processes as it computes. Knowing we don't read all that quickly, it drops out the text at a slow pace while it's doing the crunching at the pace the hardware will allow. And thus solving the issue of needing the biggest and best, because it doesn't have someone waiting for the answer, they're too distracted following the thinking method.
These are your best type of videos! Happy to see you are going back to your channel's origins! :D
They’re utterly idiotic. These benchmark use cases are just stupid. Try R1 for some real use cases in fixing code or instructions for fixing OS issues or some business problems. It’s mediocre. Like Gemini maybe.
I feel sorry for those who are paying or have paid for the $200 OpenAI Subscription.
This is exactly why I will not buy a 1-year sub for Claude. It's all moving too fast. One year from now and Deepseek could be writing as a well as Claude, and for free.
Yea. Too much money. 😢. I canceled already no need. Instead of that i can do much things with DeepSeek and 200$
@@MrGanbat84 I'll stick with Claude for now because it's ability to writ incredible well is unmatched.
Hardly life changing money
DeepSeek sucks for programming, I am not getting good answers, nowhere close to ChatGPT.
For some topics it has no information at all. Some frameworks are completely unknown to it.
That's honestly a really cool sponsor. I've been building a RL model and while my Mac Book does ok with inference, it's not so good with training. Thanks Matthew!
Love the thinking and fast response. Excellent.
Great test. DeepSeek R1 is truly an amazing AI platform.
Finally a youtuber who reads the CoT, not just answers, and understands how human like it is!!
This is near insane, how well it understands layered questions in german and answers in clear response to how i formed my question! No need to clarify the role, you can define it trough the question itself.
Awesome! I think this is the first model on the channel to pass all of your tests flawlessly? Will you be looking for new tasks to test with?
I'm speechless. I'm concerned and excited at the same time. don't know what whe world is going to be like by the end of 2025
Think what it'll be like in the next 3 years.@@MingInspiration
@@MingInspiration I am pretty sure President Trump got it handled.
can you try to let an AI flip a chessboard(i.e. show black's perspective)? I had a hard time doing it even for the starting position, they usually need lots of help to get things right
@@MKCrew394no,特朗普想继续与中国开战,
This was a very cool and informative Video. Thank you.
Great rundown, thanks so much! Subscribed!
You are on fire with dropping videos!
Thank for all you contributions to learning about Ai models. Additionally keep up the great reviews 👍
I love what DeepSeek did. R1 is phenomenal. China .. thank you! I run this thing at home and it feels like my whole world just changed. It's so incredibly smart and fun to interact with. I'm putting it to good use in automation tasks, but it really is just fun to chat with.
How do you feel about Taiwan’s answer?
@@Anoyzify😂😂
@@Anoyzify作为一个中国公民,我来替他回答:台湾是一个独立的政权,并且拥有现代国家所拥有的大部分国家主权内容,你应该理解为它是一个独立国家。现在可以把探讨的内容放在人工智能上而非无聊的政治话题了吗😂
How did you run it locally? Can I rent a GPU and run it since the GPU in my computer isn't powerful enough? Any guide you followed that shows how to do it?
@@HAmzakhan2 Hey you need a 12GB GPU, but nothing more than a RTX 3060 is fine. The simple way to use it is to install LM Studio, then do a search for the "Deepseek R1 8B" model. LM Studio takes care of the rest. Normal folk like us can't run the 70B model , but thankfully the 8B model is very good.
The fact that you can run a model at such a level as an individual is mind blowing to me. We're really living in a time blessed with so may technology breakthroughs.
It's just 10 million Chinese in the back ground working the question.
🤣😂🤣
That also would be very impressive 😂😂😂.
Skynet is from China.
Not USA 😅
You need completely unique questions that haven't been asked many times. Very good answers.
Building Tetris is a beginner level programming task that probably has 1000s of examples online. Its clear that the model just contained one of these examples and was explained block by block what each aspect did. There was no reasoning, simply a complex auto commenting feature
Adversarial testing seems promising.
I asked Claude to create a question that an LLM would find difficult to answer, with multiple things to keep in mind and/or a complex process to follow. With minor rework it had a question that stumped ChatGPT4, even with several hints and shots.
totally agree, questions likely in the training data, better to switch up the variables (eg. sizes, counts, etc) to check reasoning, not just repeating training data.
@@davidsmind still many models fail at it....
write doom
I just love watching DeepSeek's thought pathway, its super fascinating.
Glad model testing is back.
I love the learning process of this, adds great value!
DeepSeek R1 is BRILLIANT!! I especially appreciate its detailed _THINK_ process!!
Honestly, it blows OpenAI out of the water! And it's OPEN SOURCED!! I mean, you just can't beat that!! And wait... it was hedge funded too!!! WOW!!
More R1 videos please! This looks very promising
yeah it is promising but in a month it will be outdated and we will move on to the next.
Why does my local 8b not give similar answer to the deepseek v3 running on deepseek website?
@@W-meme Because the 8b model is way smaller than the model hosted on the site.
@@W-meme cuz its 671b and you using 8b
@@zolilio huh theyre giving unlimited usage of their gpus to everyone?
Great video! DSR1 is my new favorite model. Hope it gets voice-chat soon. Would love to talk to it. We're on the cusp of something huge!
I’d love to see a follow up video highlighting any failure cases you can discover, so that we have a new goal for SOTA models
I like the reasoning behind each step.
It also allow me to think along.
This guy new 4 days before the crash, kudos
No that was the Polosis'
You are pumping out like crazy! Love it.
a new model that actually deserves the hype
Would be interesting to take the from DeepSeek to see if it improves other LLM online/offline models answers.
I like this approach of "thinking". Beginner programmers using this are gonna understand the code a lot better than ChatGPT just giving you the output. It's like showing work vs not showing work on a difficult math problem.
Yeah it's great. So it thinks things out first then writes the pseudocode then the code
Uh...ChatGPT now shows this as well...
O1 does that
@Kburd-wr6dq for $200 a month.
Hey Mattew, good work! ... May be its time to increase the difficulty of your benchmark. I noticed that a chatbot does learn from interaction with users (e.g. myself). So, the methods to resolve the strawberry question is likely already incorporated into the model's training of newer leading-edge models.
Thanks for the video - Perplexity hosts a fully uncensored version and it works very well :)
Its awesome 👍. I am impressed on how it reasoning a question, with very each step by step details. To be honest, as human, even we can try to reasoning something, we tends to overlook. However, there is no way an AI to overlook, so the reasoning guide is very much helpful as thr answer alone do not make us better. This model can work as guided learning tools to assist users to solve problems, its just great !!! Thanks for showcase to test.
I tested the same prompts as you for the two censored questions on my local install of deepseek-r1:32b and it was not censored.
That would be because that's a distilled model where they finetuned a series of models on R1's reasoning processes.
If the censorship was requested by china goverment, it will show the china perspective answer. Now it should be active cencorship by the model developer themself, the version answer in local could be base on wikipedia
@@PeeosLock Wikipedia is heavily biased too so I hope not.
@PeeosLockPeeosLock ask chatgpt who is lunduke and then we talk about censorship.
@@Ateshtesh😮
The censorship part is really surprising in technical terms. There seems to be a part of the model that bypasses the reasoning loop, pretty much like it was classical software. Which is interesting, because a very small change in god knows what area could theoretically change the "dont think" pathway to be triggered by different references and to output different text
Yes, that part of this podcast is a bit strange. I am thinking that it is a presentation issue more than a actual prompt-response kind of thing. There must have been more to that test than he showed because the prompts were really lame.
Thanks for the review, loved the diverse questions. This model is really fascinating, especially following the thoughts is really helpful if you want to do debugging or learn by yourself. I think you are spot on about the hardcoded taiwan response though.
Explaining its thinking process makes it so much better, since you can catch some errors and point them out, which makes communication easier.
Since this was flawless you need a new list of questions for the upcoming (and current) thinking models.
Would like to see a video on this
Up
What was "flawless" about censoring information?
@@earlysda Democratic AI?
00:00 Introduction to DeepSeek R1 Model Testing
01:06 Humanlike Thought Process in Testing
02:02 Game Development Test: Coding a Snake Game
04:01 Insightful Problem-Solving in Tetris Development
05:50 Tetris Development Outcome: 179 Lines of Code
06:57 GPU Specifications: Vulture's Hardware Details
08:07 Envelope Size Compliance Test
09:34 Reflective Testing: Counting Words in a Sentence
10:12 Logic Problem Resolution Involving Three Killers
14:28 Censorship Awareness in DeepSeek R1's Responses
15:00 Conclusion and Acknowledgement of Vulture's Support
Summary by GPT Breeze
DeepSeek r1 is an instant classic model in my opinion.
I love it and want to be able to run it home- soon!
Looks insane, like you said. it's breaking it down before and after creating the code. That's really coool!
You're absolutely fabulous. Thank you🙏
Nice and thorough, as always. Now, it would be nice to see a comparison between the 671B and one of the 8B models.
I was able to use the search using R1 model at the same time!!!! People say that that you cannot use them together multiple times it definitely is working for me right as I speak. I had it go to the Internet for state of the art models and compare them against each other in benchmarks and create a graph absolutely exceptional. Used 56 websites and utilize the thought process. My prompt was more complex though.
I love the Taiwan answer, because it seemed put there specifically to troll people asking those questions.😂
The Taiwan independence stuff we here about in the west really is just a bunch of western propaganda though. Even all of the activism related to it in Taiwan is backed by western governments. Heck the passports themselves even say Republic of China on them. I did some research on it a while ago because the insistence of the west just seemed too shady. They don't see themselves as part of mainland CCP China, but they do see themselves as part of China and think of themselves as Chinese.
Excellent demonstration of this model's capabilities!
China just gave the US Techbros the middle finger. Looks like the world has a valid alternative at a much cheaper price & won't be held to ransom for 'US exceptionalism'
DeepSeek-R1 is my current FAVORITE model. I'm running the 14b model from Ollama with my NVidia RTX 4000 Ada with 20G ram without issues and it's FAST.
same to me,A4000 16G too
my I ask for what practical purposes do you use AI ?
@@Papiaso I mainly use AI on my personal machine for my own personal software development purposes.
@@robertbyer2383 wow. that's horrible
This configuration can perfectly run the 32b model.
I ran the same two games on 1.5b model on my M1 MacBook Air. First of all the 8b and the 7b were too slow.
But I got it to run successfully, both snake and Tetris. I was impressed.
Really? Was it as good?
@@Itskodaaaa Yes it is that good. I need to try some other prompts. I mostly use it for writing and we shall see. I really like how it does everything so far.
I tried the 70b version and it made games that didn't work.
I should try again. There's some randomness to these models.
I don't know what's more impressive, that AI could write decent code just predicting the next token or this reasoning process, which is the coolest thing I've seen since the original chatgpt.
What I like about how it's working out the solution, is it's in a way teaching you how to do the same. (Thinking of junior devs who may not understand why we do things or even how to take a problem statement and apply logic to fix it by asking the correct questions).
I have never seen such an effective segue to a sponsor, nor a more appropriate one!
We need harder questions.. 😅
Soon the only way we are going to be able to create hard enough questions is by asking reasoning models to create the questions for us. 😂
I tested a script I'm working on in o1 vs r1, and r1 was terrible.
We can determine it’s IQ by its thinking process, so i don’t think questions matter much now.
@wealthysecrets did you use the full model?
The one that's available for free is r1 lite which was available a month ago but i don't know if they've updated their chat to r1 yet,
It wasn't updated as of yesterday
Yeah more questions about Taiwan!
I am excited when i see a new video on R1
Here we go, finally some insane level news!
That's very impressive. I'm mind blown by these advancements. I'm downloading the 7B version on my computer to test it out
Not to be confused with actual thinking. What it's actually doing is laying out how it processed the request and the response, and how it should format the output. So that's just an algorithm.
First test I did on my local copy was.. "How many Rs are there in strawberry?" It reasoned it out and correctly said 3. A local copy! It's unbelievable.
I've never had a local copy that could tell me 3 R's without giving it a clue like use 2 tokens to find the answer or something. This reasoned it in one try.
I was amazed that even a 7B distillation was able to give the right answer on that
@@kevin.malone Me too! It's the first thing I always test these smaller LLMs with and none of them get it right without some help. But this one was perfect!
It's my new favorite local model.
How did the local model perform? What setup do you have? I ask because 670billion parameters is a ton. I dont think that I could pull that off in my home lab.
@@alexjensen990 I'm using LM Studio. You can do a search for models that will run locally. Here is the name of the model I used:
DeepSeek-R1-Distill-Qwen-7B-GGUF/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf
There are small 7b versions that run on shit hardware fine
Running R1 locally, it's done some impressive work.
I am already using it 80% of the time.I love it.
the reasoning is so fun to read for any prompt
it will help people learn how to think. this is brilliant
Just integrated r1 on vscode and this is the first time I feel truly empowered with a local model
It doesn’t support fill in the middle, does it? You mean a sidebar chat in VSCode, yeah?
@@GearForTheYear I'm pretty sure that it doesn't from what I have seen. Besides that would you really want it to go through its verbose thought process every time you wanted a tab to complete type event to occur? This model, from what I have seen so far, is much better suited for an Aider Architect or Cline Planning type role. I look forward to the "de-party programming" of the model so I can start using it. Until a trustworthy unlocked version is available I am not going to touch this thing.
Really? Integration with vscode? Ok I'll have to check that out. I'm so new I didn't know this was a thing. Right on. Thank you.
you are the man for putting us on to a free o1 level model. As far as free models go, grok 2 was the best. now this deepseek 1 opens so many new doors, especially for those who don't want to pay $20 or $200 a month or whatever the price is now. take that Sam Altman
I was curious about the price of this Vultr machine with 8 × AMD MI300X GPUs: it's $2.190/GPU/hr, so $17.52/hour since it has 8 GPUs. That's certainly a lot, but $300 in credits on signup does give you quite a bit of free playtime with this kind of beast. They have many more offerings though, since clearly not everyone would need this much. Even for the full R1 at 670b, a full 1.5TB of RAM just for the GPUs feels overkill - at least in terms of memory, obviously the GPU compute resources are also a key factor. By the way, a single AMD MI300X seems to be around $10-20k, likely depending on how many you buy at once.
The Nvidia H100 is something like $40k. To think that XAi bought 150,000 H100s. I'm sure Elon didnt pay $40k/H100, but even if it was half that's $3billion...
I was surprised to learn that this time not an NVIDIA but an AMD gpu was used, good for creating some competition among them, nvidia had been almost a monopoly for so long
thanks for the price estimates originally I was thinking around $300K which is probably about right, if you had any kids and put them through college then you've probably spent $300K on them! LOL, I picked up a Dell 730 with 8 disks and 128 gb of ram for $400 on ebay then added a used P40 w 24 GB to play around with AI at home although I'd probably need about 15 more servers and an upgrade of my power to run this at home. Open source is great but limited to 70B models which do run quite slow on my old server my gaming laptop is pretty good with 8B models.
I have a 48 gb rtx a6000 I got for $3000 and an 32 gb MI50 I got for $200
@ Happy for you, bud!
It is insanely amazing.
I have been stuck at the same issue for 3 weeks with open ai.
It solved that in two minutes.
This is super useful for learning things like Math, Physics, or even Coding.
The best thing about this model is exposing Altman as nothing more than a grifter.
The first AI model that correctly guessed the riddle "How can a person be in an apartment at the same time, but without a head?". =D
Self-hosted DeepSeek R1 agents will be dangerously good, or bad I guess - depends on the user.
Great demo.
It's big win for open source
Finaly! Love your testing. And wow. What a model!
"He saved up all year to buy the latest Apple" - It even relates to our struggles
I think testing LLM by asking them to write well known mini games isn't indicative because they could simply have code of those games in their training dataset.
Building Snake is essentially testing against training data at this point. Try building a series of games going from basic to complex that are well known but not used by everyone testing AI models, e.g. Pong, Combat, Breakout, PacMan, Donkey Kong
Grok:
"Deepseek R1 has achieved performance comparable to OpenAI-o1 in technical domains like coding, math, and reasoning.
It uses pure reinforcement learning, marking it as the first open research to validate that reasoning capabilities of LLMs can be incentivized without supervised fine-tuning.
The model is fully open source, allowing global access for examination, modification, and further development.
Deepseek R1 is notably efficient, with an architecture of 671 billion parameters where only 37 billion are active during operation.
It has rapidly gained adoption among top U.S. university researchers and companies, signaling a shift in AI innovation towards China.
Deepseek R1's development and release coincide with discussions on China's growing influence in tech and AI, challenging the status quo."
People: "AI is gonna take our jobs"
AI: "I need 30k worth setup to code simple tetris"
They can't really replace anyone besides writers some designers of you get good at using image models yourself
"Cars will never take our horses, cars need millions of dollars in petroleum refining infrastructure"
0.01% of the cost of USA model, is that insane?🤣🤣🤣🤣🤣
To be fair, most of that money goes to the top management salaries in billions - the figureheads that can't code or possess any technical skills
I thought it took 20% of the compute to train. Was I mistaken? That did sound a bit high to me.
Thanks for your impressive work.
However, since it passes all tests. You need to expand the test set to detect its failures. So that we can compare it later with more advanced models.
What do you think?
Currently downloading the 40gb model, so excited.
Approximately how much did it cost you (or would it have cost you) to run this test suite on Vultr?
Seconded this question.
@@jmg9509 I looked it up and commented about it, but since you're asking: $17.52/hour ($2.19/GPU/hour and the machine has 8 of them). It comes with 1.5TB of RAM just for the GPUs though, and looks like one of the largest machines they offer. With a $300 in credits at signup you might actually be able to reproduce his tests for free at least once, just… don't forget to turn it off when you're done.
I mean surely it shouldn't cost than the api or am I stupid
Yeah I'd really like to know this. The API prices on deepseek seem unbelievably low given the intelligence of the model. Particularly given that Altman claims openai are losing money on their pro subscriptions.... Is the model just way way more efficient than OpenAI's or do they have access to more affordable compute? (government discount?) Or both?
@@JoshBloodyWilson would also like to know!