i think deepseek is very overhyped it has categorical thinking, no thinking is better than categorical thinking it assumes new info by itself and produces erroneous results.... and it even doesn't understand basic instruction like i told it to translate a copied subttile text , even when i taught it to do it right it intentionally made mistake when i gave the entire text
No joke, In this first release batch of 50 series cards, I think NVIDIA unironically shipped out more review samples to RUclipsrs than they did to retailers. Maybe not too surprising, i guess. If stock is low, may as well build hype instead of selling a few hundred additional GPUs.
I don't know why people say this, all the models are "real" models, they're just different. It would make more sense to say that it is not the "original" model, because the Distill models were produced by taking things like Llama or Qwen and readjusting their weights based on synthetic data generated from R1, so the weights are a hybrid of the two models (either a hybrid of Qwen+R1 or Llama+R1 depending on which you download), but they are still "real" models, just not the original R1. I don't know what it would even mean to have a "fake" model.
@ You literally are changing the weights of the model, it is no longer the same model. To claim that a modified qwen2 is literally identical to qwen2 is easily falsified just by running the "diff" command on the two model files. They are different models. If you adjusted qwen2's weights based on the output of o1, it would neither be qwen2 nor o1, but would be a new model that is hybrid between them and would take on characteristics of both, as this literally causes the model to acquire information and properties from o1.
Honestly I’m fed up with these 5090 videos. The only fricken people in the world that can actually get their hands on these cards are RUclips reviewers! I think I might start my own channel, just so I can get a GPU. 😂
the issue with trusting ai is we taught it how to process data, and trained it to give outputs, but we don't know the processes between the two it's considered the black box on some channels it's interesting that deepseek does the thought process thing before giving you the real output. it's aimed for transparency and to give you insight on the black box, but now there's the question of how was the output of the thought process generated? still the unknown black box issue but a clever idea
Eh the thought box is not what it actually "thinks". It just answers the prompt a first time, and then summarises it into the "real" answer. There is no thinking going on, we know _how_ it works. We just don't really understand why it works so well.
@randomseer well before you weren't sure what biases were in play for your final output, now you get a false window into how it came up with what it said, but even that solution has a new black box, we still don't fully understand it's interpretation and biases because the output of it's deepthink process it's still inexplainable
its not about the quantization. its about the actual model in parameters size 671b even if you run it at q4 its still much better than all these distill versions because the base model for that was deepseek v3 which is a very good model. And I know its not for home lab as least for now. but there are ways it can run at 1.58 bit with unsloth''s method. what require 131GB vram instead of 741GB vram
Do you really use the correct Deepseek R1?? I use the one from Ollama, and it had no problems answering the questions on the 7B model, also, the 32b model is only 20GB
i have a very old laptop and it running 7B model makes it go bonkers. i am looking to shift to mac mini m4. for running 14B model will 16gb be enough? or should i go for 24/32?
macs have unified memory so the vram is also your system ram, 14b is around 11gb you would only have 5gb left for macos and whatever else you are working on
i got 32b running on my M2 granted it's slow as balls but if i close almost everything it'll run, 14b is almost usable and anything lower runs like the wind, looking at your memory usage is bizarre, maybe i don't have context windows setup but my 7700xt can also run 14b but not 32b, and my mac has 24gb of ram letting it pull 32b nvm i have quantised versions of da models
Same here, but my Ollama Deepseek did not have any problems with the questions either, so wierd that his did not even could answer the strawberry question correct :)
They are hybrids of R1 and other models (either Llama or Qwen depending on the one you download), their weights containing information from both models they were created from. I don't think it is unreasonable to say something like DeepSeek R1 Qwen Distill is a "version of R1," and equally I would not think it is very unreasonable to say it is a "version of Qwen," both statements are true since it's a hybrid of the two. It is being oddly nitpicky to try and fight against this.
You are correct, but 99.9% just can't grasp that the distilled models are qwen or llama. Heck, it even states the arch in this video a such and people still think it's R1. Notice the other one in this thread yapping about it being a hybrid, etc. Sigh.
@ They are objectively not Qwen or Llama, this is easy to prove just by doing the "diff" command between the models, you will see they are different. The models are R1 Qwen Distill and R1 Llama Distill, not Qwen or Llama, nor are they R1. You are spreading provably false misinformation.
5090s are better suited at AI workloads than they are at gaming. Like what game even needs anything close to 32GB of vram? 😂 Most games use between 8GB to 12GB, with only a very select few that even use 16GB, which usually involves full path tracing. The 5090 is literally using a binned GB202 die used in AI workstations.
its so annoying that Nvidia crashed because deepseek r1, a highly hallucinating copy of a copy (trained by gpts outputs), benchmarked alongside gpt why sell Nvidia? you can run it on M2s? cool that means you can run it better with 5090s Nvidia is down 600 billion for what?
It'll probably trickle back up. It's investor panic by people who aren't really informed about technology and the implications of certain things. I do think NVIDIA is pretty risky though. I think if it takes AI too long to become profitable, people will pull out.
It's not about nvidea consumer GPUs , it's about the fact that it was trained with a lot fewer nvidea GPUs than people expected. The primary reason Nvidea is valued is for the GPUs they use for training
@@randomseer This. If companies are telling investors they need, say, 2 million GPU's to run ChatGPT but an alternative comes along that shows you only need 1/10th of those, well, then the demand for said GPUs might be a lot less than the 2 million.... That and the fact that smaller models that are just as accurate can run on competitors' products (say AMD gpu's or Apple M series stuff) means that demand for Nvidia might actually be even lower. Lower than forcasted demand means with alternatives might mean that the moat Nvidia has is non-existant. Those could be the reasons Nvidia dropped.
Hi everyone! Thanks for watching :) Let me know how you like this format, trying to post a lot more but not get in my head about it.
Hi, Thanks for this video! I am thinking about getting the RTX 5090 (someday!) so this videos will be of great value! I hope you do more!
i think deepseek is very overhyped it has categorical thinking, no thinking is better than categorical thinking it assumes new info by itself and produces erroneous results.... and it even doesn't understand basic instruction like i told it to translate a copied subttile text , even when i taught it to do it right it intentionally made mistake when i gave the entire text
No joke, In this first release batch of 50 series cards, I think NVIDIA unironically shipped out more review samples to RUclipsrs than they did to retailers.
Maybe not too surprising, i guess. If stock is low, may as well build hype instead of selling a few hundred additional GPUs.
Rtx 5090 and deepseek in the same title is bound to be viral
The fact the AI is reporting having a whole internal debate about how many R's are in strawberry.
It's six btw
You can run 32B on a 3090 without any issues, and it runs smoothly.
Well technically that's not the real model 🤓☝
Okay, dip****
🤬🤬🤬🤬
I don't know why people say this, all the models are "real" models, they're just different. It would make more sense to say that it is not the "original" model, because the Distill models were produced by taking things like Llama or Qwen and readjusting their weights based on synthetic data generated from R1, so the weights are a hybrid of the two models (either a hybrid of Qwen+R1 or Llama+R1 depending on which you download), but they are still "real" models, just not the original R1. I don't know what it would even mean to have a "fake" model.
??? So when you train on the output of o1 model suddenly the model becomes o1?? Naw its just qwen2 finetuned via grpo
@ You literally are changing the weights of the model, it is no longer the same model. To claim that a modified qwen2 is literally identical to qwen2 is easily falsified just by running the "diff" command on the two model files. They are different models. If you adjusted qwen2's weights based on the output of o1, it would neither be qwen2 nor o1, but would be a new model that is hybrid between them and would take on characteristics of both, as this literally causes the model to acquire information and properties from o1.
Honestly I’m fed up with these 5090 videos. The only fricken people in the world that can actually get their hands on these cards are RUclips reviewers! I think I might start my own channel, just so I can get a GPU. 😂
Thanks to DeepSeek we now know Nvidia is using AI to squeeze these chips, these cards are a rehash with gddr7.
why is 14b using so much of your VRAM?
I can run it on a 16gb card with a couple gigs of slack
oh it's not quantized
@@agush22 lmao
16p, q8 does they have difference in output?
Please run 70b. Also can we use two GPU for faster and more accuracy?
the issue with trusting ai is we taught it how to process data, and trained it to give outputs, but we don't know the processes between the two
it's considered the black box on some channels
it's interesting that deepseek does the thought process thing before giving you the real output. it's aimed for transparency and to give you insight on the black box, but now there's the question of how was the output of the thought process generated? still the unknown black box issue but a clever idea
Eh the thought box is not what it actually "thinks". It just answers the prompt a first time, and then summarises it into the "real" answer. There is no thinking going on, we know _how_ it works. We just don't really understand why it works so well.
It's still a black box , it's generating it's reasoning the same way got generates final output
@randomseer well before you weren't sure what biases were in play for your final output, now you get a false window into how it came up with what it said, but even that solution has a new black box, we still don't fully understand it's interpretation and biases because the output of it's deepthink process it's still inexplainable
its not about the quantization. its about the actual model in parameters size 671b even if you run it at q4 its still much better than all these distill versions because the base model for that was deepseek v3 which is a very good model. And I know its not for home lab as least for now. but there are ways it can run at 1.58 bit with unsloth''s method. what require 131GB vram instead of 741GB vram
Do you really use the correct Deepseek R1?? I use the one from Ollama, and it had no problems answering the questions on the 7B model, also, the 32b model is only 20GB
He might've downloaded the unquantized version.
@@amihartz aahh, yes did not think about that :)
what are your PC specs?
i have a very old laptop and it running 7B model makes it go bonkers. i am looking to shift to mac mini m4. for running 14B model will 16gb be enough? or should i go for 24/32?
The more the better honestly. But 16 does me really well. Just can't go any higher than the base sizes.
macs have unified memory so the vram is also your system ram, 14b is around 11gb you would only have 5gb left for macos and whatever else you are working on
@@prof2k which parameter are you running right now?
@@agush22 so 24/ 32 gb would be better for running 14B?
i run that 33b model in RTX 4070 Super, it's really have amazing performance
Which model the web of chat deepseek use for it self?
I want you to run more AI models locally on your PC.
How has nobody found this??
new channel
@DebugWithLewis cool
i just installed full 32B model and i have Sapphire RX 7900 XT 20GB Nitro+ and it runs
Yes, it can even run locally without a gpu. Clearly performance is affected, but it can run.
Are you still acting Mr Daniel Day Lewis? Any new movies coming
Make a video on how to make any deepseek model quantized
I got it running on a TITAN X Pascal of course it will I even run it in my application
I just turn deepseek on my macbook pro of mid 2017 with the badest intel CPU
i got 32b running on my M2 granted it's slow as balls but if i close almost everything it'll run, 14b is almost usable and anything lower runs like the wind,
looking at your memory usage is bizarre, maybe i don't have context windows setup but my 7700xt can also run 14b but not 32b, and my mac has 24gb of ram letting it pull 32b
nvm i have quantised versions of da models
Even a phone can run 1.5-7 b
Same here, but my Ollama Deepseek did not have any problems with the questions either, so wierd that his did not even could answer the strawberry question correct :)
no deepseek can not run on a rtx 5090 but on a raspberry pi
i can run 7B on my 3070 pretty well, so why pay more
Intel b580 24gb with zluda
no, you can't.
the distills are not "versions" of the model.
They are hybrids of R1 and other models (either Llama or Qwen depending on the one you download), their weights containing information from both models they were created from. I don't think it is unreasonable to say something like DeepSeek R1 Qwen Distill is a "version of R1," and equally I would not think it is very unreasonable to say it is a "version of Qwen," both statements are true since it's a hybrid of the two. It is being oddly nitpicky to try and fight against this.
@@amihartz sure but it cannot be compared to the real R1, they are not the same model.
You are correct, but 99.9% just can't grasp that the distilled models are qwen or llama. Heck, it even states the arch in this video a such and people still think it's R1. Notice the other one in this thread yapping about it being a hybrid, etc. Sigh.
@ They are objectively not Qwen or Llama, this is easy to prove just by doing the "diff" command between the models, you will see they are different. The models are R1 Qwen Distill and R1 Llama Distill, not Qwen or Llama, nor are they R1. You are spreading provably false misinformation.
@ they are qwen and llama based, yes the weights have been changed but it does not matter.
if you do a distance analysis they are very very close.
hum...maybe two Radeon RX 9070 XT can run better more
So you bought a 5090 from the scalpers just to run DeepSeek distilled models locally, not gaming? Seriously?
Why is that an issue?
5090s are better suited at AI workloads than they are at gaming. Like what game even needs anything close to 32GB of vram? 😂 Most games use between 8GB to 12GB, with only a very select few that even use 16GB, which usually involves full path tracing. The 5090 is literally using a binned GB202 die used in AI workstations.
its so annoying that Nvidia crashed because deepseek r1, a highly hallucinating copy of a copy (trained by gpts outputs), benchmarked alongside gpt
why sell Nvidia? you can run it on M2s? cool that means you can run it better with 5090s
Nvidia is down 600 billion for what?
It'll probably trickle back up. It's investor panic by people who aren't really informed about technology and the implications of certain things.
I do think NVIDIA is pretty risky though. I think if it takes AI too long to become profitable, people will pull out.
It's not about nvidea consumer GPUs , it's about the fact that it was trained with a lot fewer nvidea GPUs than people expected. The primary reason Nvidea is valued is for the GPUs they use for training
@@randomseer This. If companies are telling investors they need, say, 2 million GPU's to run ChatGPT but an alternative comes along that shows you only need 1/10th of those, well, then the demand for said GPUs might be a lot less than the 2 million.... That and the fact that smaller models that are just as accurate can run on competitors' products (say AMD gpu's or Apple M series stuff) means that demand for Nvidia might actually be even lower. Lower than forcasted demand means with alternatives might mean that the moat Nvidia has is non-existant. Those could be the reasons Nvidia dropped.