Can DeepSeek R1 Run Locally on a NVIDIA RTX 5090!?

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 71

  • @DebugWithLewis
    @DebugWithLewis  2 дня назад +4

    Hi everyone! Thanks for watching :) Let me know how you like this format, trying to post a lot more but not get in my head about it.

    • @RayWrightRayrite
      @RayWrightRayrite 2 дня назад

      Hi, Thanks for this video! I am thinking about getting the RTX 5090 (someday!) so this videos will be of great value! I hope you do more!

    • @bmqww223
      @bmqww223 День назад

      i think deepseek is very overhyped it has categorical thinking, no thinking is better than categorical thinking it assumes new info by itself and produces erroneous results.... and it even doesn't understand basic instruction like i told it to translate a copied subttile text , even when i taught it to do it right it intentionally made mistake when i gave the entire text

  • @Dom-zy1qy
    @Dom-zy1qy 2 дня назад +51

    No joke, In this first release batch of 50 series cards, I think NVIDIA unironically shipped out more review samples to RUclipsrs than they did to retailers.
    Maybe not too surprising, i guess. If stock is low, may as well build hype instead of selling a few hundred additional GPUs.

  • @Vimblini
    @Vimblini 14 часов назад +2

    Rtx 5090 and deepseek in the same title is bound to be viral

  • @akam9919
    @akam9919 2 дня назад +13

    The fact the AI is reporting having a whole internal debate about how many R's are in strawberry.
    It's six btw

  • @Metarig
    @Metarig 11 часов назад +2

    You can run 32B on a 3090 without any issues, and it runs smoothly.

  • @miguelito5602
    @miguelito5602 2 дня назад +22

    Well technically that's not the real model 🤓☝

    • @saiuan4562
      @saiuan4562 2 дня назад

      Okay, dip****

    • @DebugWithLewis
      @DebugWithLewis  2 дня назад +7

      🤬🤬🤬🤬

    • @amihartz
      @amihartz День назад +1

      I don't know why people say this, all the models are "real" models, they're just different. It would make more sense to say that it is not the "original" model, because the Distill models were produced by taking things like Llama or Qwen and readjusting their weights based on synthetic data generated from R1, so the weights are a hybrid of the two models (either a hybrid of Qwen+R1 or Llama+R1 depending on which you download), but they are still "real" models, just not the original R1. I don't know what it would even mean to have a "fake" model.

    • @poisonza
      @poisonza День назад +1

      ??? So when you train on the output of o1 model suddenly the model becomes o1?? Naw its just qwen2 finetuned via grpo

    • @amihartz
      @amihartz 23 часа назад +1

      @ You literally are changing the weights of the model, it is no longer the same model. To claim that a modified qwen2 is literally identical to qwen2 is easily falsified just by running the "diff" command on the two model files. They are different models. If you adjusted qwen2's weights based on the output of o1, it would neither be qwen2 nor o1, but would be a new model that is hybrid between them and would take on characteristics of both, as this literally causes the model to acquire information and properties from o1.

  • @BrentLeVasseur
    @BrentLeVasseur 5 часов назад +1

    Honestly I’m fed up with these 5090 videos. The only fricken people in the world that can actually get their hands on these cards are RUclips reviewers! I think I might start my own channel, just so I can get a GPU. 😂

  • @paulroberts7429
    @paulroberts7429 54 минуты назад +1

    Thanks to DeepSeek we now know Nvidia is using AI to squeeze these chips, these cards are a rehash with gddr7.

  • @agush22
    @agush22 День назад +3

    why is 14b using so much of your VRAM?
    I can run it on a 16gb card with a couple gigs of slack

  • @anshulsingh8326
    @anshulsingh8326 День назад +1

    16p, q8 does they have difference in output?

  • @margovincent22x
    @margovincent22x День назад +2

    Please run 70b. Also can we use two GPU for faster and more accuracy?

  • @gaswegn
    @gaswegn 2 дня назад +5

    the issue with trusting ai is we taught it how to process data, and trained it to give outputs, but we don't know the processes between the two
    it's considered the black box on some channels
    it's interesting that deepseek does the thought process thing before giving you the real output. it's aimed for transparency and to give you insight on the black box, but now there's the question of how was the output of the thought process generated? still the unknown black box issue but a clever idea

    • @BineroBE
      @BineroBE 2 дня назад +1

      Eh the thought box is not what it actually "thinks". It just answers the prompt a first time, and then summarises it into the "real" answer. There is no thinking going on, we know _how_ it works. We just don't really understand why it works so well.

    • @randomseer
      @randomseer 2 дня назад

      It's still a black box , it's generating it's reasoning the same way got generates final output

    • @gaswegn
      @gaswegn День назад

      @randomseer well before you weren't sure what biases were in play for your final output, now you get a false window into how it came up with what it said, but even that solution has a new black box, we still don't fully understand it's interpretation and biases because the output of it's deepthink process it's still inexplainable

  • @AnirbanKar4294
    @AnirbanKar4294 22 часа назад +1

    its not about the quantization. its about the actual model in parameters size 671b even if you run it at q4 its still much better than all these distill versions because the base model for that was deepseek v3 which is a very good model. And I know its not for home lab as least for now. but there are ways it can run at 1.58 bit with unsloth''s method. what require 131GB vram instead of 741GB vram

  • @AndrewTSq
    @AndrewTSq День назад +1

    Do you really use the correct Deepseek R1?? I use the one from Ollama, and it had no problems answering the questions on the 7B model, also, the 32b model is only 20GB

    • @amihartz
      @amihartz День назад +1

      He might've downloaded the unquantized version.

    • @AndrewTSq
      @AndrewTSq День назад

      @@amihartz aahh, yes did not think about that :)

  • @souljeah
    @souljeah 5 часов назад

    what are your PC specs?

  • @themohammadsayem
    @themohammadsayem День назад

    i have a very old laptop and it running 7B model makes it go bonkers. i am looking to shift to mac mini m4. for running 14B model will 16gb be enough? or should i go for 24/32?

    • @prof2k
      @prof2k День назад

      The more the better honestly. But 16 does me really well. Just can't go any higher than the base sizes.

    • @agush22
      @agush22 День назад

      macs have unified memory so the vram is also your system ram, 14b is around 11gb you would only have 5gb left for macos and whatever else you are working on

    • @themohammadsayem
      @themohammadsayem День назад

      @@prof2k which parameter are you running right now?

    • @themohammadsayem
      @themohammadsayem День назад

      @@agush22 so 24/ 32 gb would be better for running 14B?

  • @taqin2
    @taqin2 2 дня назад +1

    i run that 33b model in RTX 4070 Super, it's really have amazing performance

  • @MrAbdoabd
    @MrAbdoabd День назад

    Which model the web of chat deepseek use for it self?

  • @alby13
    @alby13 21 час назад

    I want you to run more AI models locally on your PC.

  • @Waszzup
    @Waszzup 2 дня назад +2

    How has nobody found this??

  • @Matrriosh
    @Matrriosh 55 минут назад

    i just installed full 32B model and i have Sapphire RX 7900 XT 20GB Nitro+ and it runs

  • @dave24-73
    @dave24-73 День назад

    Yes, it can even run locally without a gpu. Clearly performance is affected, but it can run.

  • @edwardm9975
    @edwardm9975 23 часа назад

    Are you still acting Mr Daniel Day Lewis? Any new movies coming

  • @kakashi7119
    @kakashi7119 2 дня назад

    Make a video on how to make any deepseek model quantized

  • @krinodagamer6313
    @krinodagamer6313 День назад

    I got it running on a TITAN X Pascal of course it will I even run it in my application

  • @val_bld
    @val_bld День назад

    I just turn deepseek on my macbook pro of mid 2017 with the badest intel CPU

  • @craftmyne
    @craftmyne 2 дня назад

    i got 32b running on my M2 granted it's slow as balls but if i close almost everything it'll run, 14b is almost usable and anything lower runs like the wind,
    looking at your memory usage is bizarre, maybe i don't have context windows setup but my 7700xt can also run 14b but not 32b, and my mac has 24gb of ram letting it pull 32b
    nvm i have quantised versions of da models

    • @randomseer
      @randomseer 2 дня назад

      Even a phone can run 1.5-7 b

    • @AndrewTSq
      @AndrewTSq День назад

      Same here, but my Ollama Deepseek did not have any problems with the questions either, so wierd that his did not even could answer the strawberry question correct :)

  • @shock-blitz77
    @shock-blitz77 День назад

    no deepseek can not run on a rtx 5090 but on a raspberry pi

  • @CO8848_2
    @CO8848_2 День назад

    i can run 7B on my 3070 pretty well, so why pay more

  • @Phil-D83
    @Phil-D83 14 часов назад

    Intel b580 24gb with zluda

  • @alkeryn1700
    @alkeryn1700 День назад +1

    no, you can't.
    the distills are not "versions" of the model.

    • @amihartz
      @amihartz День назад

      They are hybrids of R1 and other models (either Llama or Qwen depending on the one you download), their weights containing information from both models they were created from. I don't think it is unreasonable to say something like DeepSeek R1 Qwen Distill is a "version of R1," and equally I would not think it is very unreasonable to say it is a "version of Qwen," both statements are true since it's a hybrid of the two. It is being oddly nitpicky to try and fight against this.

    • @alkeryn1700
      @alkeryn1700 16 часов назад

      @@amihartz sure but it cannot be compared to the real R1, they are not the same model.

    • @jeffwads
      @jeffwads 15 часов назад

      You are correct, but 99.9% just can't grasp that the distilled models are qwen or llama. Heck, it even states the arch in this video a such and people still think it's R1. Notice the other one in this thread yapping about it being a hybrid, etc. Sigh.

    • @amihartz
      @amihartz 15 часов назад

      @ They are objectively not Qwen or Llama, this is easy to prove just by doing the "diff" command between the models, you will see they are different. The models are R1 Qwen Distill and R1 Llama Distill, not Qwen or Llama, nor are they R1. You are spreading provably false misinformation.

    • @alkeryn1700
      @alkeryn1700 14 часов назад

      @ they are qwen and llama based, yes the weights have been changed but it does not matter.
      if you do a distance analysis they are very very close.

  • @larrowvolru7204
    @larrowvolru7204 День назад

    hum...maybe two Radeon RX 9070 XT can run better more

  • @tringuyen7519
    @tringuyen7519 20 часов назад

    So you bought a 5090 from the scalpers just to run DeepSeek distilled models locally, not gaming? Seriously?

    • @demolicous
      @demolicous 18 часов назад +3

      Why is that an issue?

    • @03chrisv
      @03chrisv 18 часов назад +1

      5090s are better suited at AI workloads than they are at gaming. Like what game even needs anything close to 32GB of vram? 😂 Most games use between 8GB to 12GB, with only a very select few that even use 16GB, which usually involves full path tracing. The 5090 is literally using a binned GB202 die used in AI workstations.

  • @gaswegn
    @gaswegn 2 дня назад

    its so annoying that Nvidia crashed because deepseek r1, a highly hallucinating copy of a copy (trained by gpts outputs), benchmarked alongside gpt
    why sell Nvidia? you can run it on M2s? cool that means you can run it better with 5090s
    Nvidia is down 600 billion for what?

    • @Dom-zy1qy
      @Dom-zy1qy 2 дня назад +1

      It'll probably trickle back up. It's investor panic by people who aren't really informed about technology and the implications of certain things.
      I do think NVIDIA is pretty risky though. I think if it takes AI too long to become profitable, people will pull out.

    • @randomseer
      @randomseer 2 дня назад

      It's not about nvidea consumer GPUs , it's about the fact that it was trained with a lot fewer nvidea GPUs than people expected. The primary reason Nvidea is valued is for the GPUs they use for training

    • @geezher
      @geezher 23 часа назад

      @@randomseer This. If companies are telling investors they need, say, 2 million GPU's to run ChatGPT but an alternative comes along that shows you only need 1/10th of those, well, then the demand for said GPUs might be a lot less than the 2 million.... That and the fact that smaller models that are just as accurate can run on competitors' products (say AMD gpu's or Apple M series stuff) means that demand for Nvidia might actually be even lower. Lower than forcasted demand means with alternatives might mean that the moat Nvidia has is non-existant. Those could be the reasons Nvidia dropped.