How to Turn Your AMD GPU into a Local LLM Beast: A Beginner's Guide with ROCm

Поделиться
HTML-код
  • Опубликовано: 20 сен 2024

Комментарии • 116

  • @misterpdj
    @misterpdj 6 месяцев назад +23

    Thanks for letting us know about this new release. Just tried it on my 6800xt, and it works. FYI, I think the supported list is all Navi 21 cards and all RDNA 3. That's the same list as the HIP SDK supported cards on the AMD ROCm Windows System Requirements page.

    • @JoseRoberto-wr1bv
      @JoseRoberto-wr1bv 5 месяцев назад +2

      How much Token/s??? Using a 7B model??

    • @chaz-e
      @chaz-e 5 месяцев назад

      And 7600XT is not a part of the official supported list.

    • @misterpdj
      @misterpdj 4 месяца назад +3

      @@JoseRoberto-wr1bv On the Q8_0 version of Llama 3 I was getting 80 t/s, but for a couple of reasons the quality wasn't so good. I'm using Mixtral Instruct as my daily driver, and getting 14-18 depending on how I balance offload vs context size.

    • @misterpdj
      @misterpdj 4 месяца назад +1

      @@chaz-e that and the 7600 are both gfx1102.

  • @ThePawel36
    @ThePawel36 5 месяцев назад +12

    I've successfully utilized 70B models with 4-bit quantization on my 4070ti Super. I offload 27 out of 80 layers partially, while the remainder utilizes the RAM. It functions quite well-not exceedingly fast, but sufficiently for comfortable operation. A minimum of 64GB of RAM is required. While VRAM is significant, in reality, you can operate 70B networks with even 10GB of VRAM or less. It ultimately depends on the model's response time to your queries.

    • @gomesbruno201
      @gomesbruno201 3 месяца назад

      it would be nice to try on a amd equivalent. maybe 7800xt or 7900xt

    • @ferluisch
      @ferluisch 2 месяца назад

      Maybe he can say his tok/s, comparing my 2080 vs the video (rx 7600) I get this results: I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(

  • @djmccullough9233
    @djmccullough9233 5 месяцев назад +10

    It works awesome on the 6800xt. Thankyou for the guide.

    • @agx4035
      @agx4035 4 месяца назад +2

      is it as fast in the video ?

    • @bankmanager
      @bankmanager 4 месяца назад +5

      ​@@agx4035the video accurately shows expected performance, yes.

    • @CapaUno1322
      @CapaUno1322 Месяц назад +1

      Just picked up a 16gb 6800, can't wait to get it installed and see what this baby can do! ;D

    • @Helldiver111
      @Helldiver111 Месяц назад

      Update ?​@@CapaUno1322

  • @cj_zak1681
    @cj_zak1681 6 месяцев назад +6

    brilliant! Thanks for letting us know, I am excited to try this

  • @sebidev
    @sebidev 29 дней назад +2

    Thanks you for the video, I can now use 8B large LLM models with my AMD RX 7600(8GB) and it is really fast. I use Arch Linux and it runs without any problems 👍

    • @puffin11
      @puffin11 25 дней назад

      How did you get it to work on Linux? I've been having issues (and Ollama seems to recommend the proprietary AMD drivers....)

    • @sebidev
      @sebidev 25 дней назад

      @@puffin11 not install amd pro drivers(proprietary). amdgpu is completely sufficient with rocm.

  • @joshuat6124
    @joshuat6124 5 месяцев назад +1

    Amazing video, I learnt a lot! I love these videos about commerical GPUs running AI/ML workloads as I'm into developing AL/ML models.

  • @pedromartins4847
    @pedromartins4847 6 месяцев назад +3

    Will be trying this out later on, thank you my man.

  • @myroslav6873
    @myroslav6873 Месяц назад +1

    Thanks, worked for me very well on my 6800xt! The answers are as quick as in the video. But I guess I need to learn how and what to ask, because the answers were always very confident and always completely wrong and made-up. I asked the chat to make a list of French kings who were married off before they were 18 yo, and it invented a bunch of Kings that never lived, and said that Emperor Napoleon Bonaparte and President Macron were both married off at 16, but they were not kings technically, and they were certainly not married at 16, lol.

  • @MiguelGonzalez-nv2rt
    @MiguelGonzalez-nv2rt 5 месяцев назад +4

    Its not working for me, I have a 7900xt installed and attempted the same as you but it just gets an error message with no apparent reason. Drivers up to date and everything in order but nothing

  • @jakeastles
    @jakeastles 2 месяца назад +1

    Thanks, the only good video I could find on yt which explained everything easily. Your accent helped me focus. Very useful stuff.

    • @TechteamGB
      @TechteamGB  2 месяца назад

      Thank you! Glad to be helpful :D

  • @VasilijP
    @VasilijP Месяц назад +1

    Well, it is not like GPGPU came just with LLMs. OpenCL on AMD GPUs in 2013 and before was the most viable option for crypto mining, while Nvidia was too slow at that time due to small cache size and poor efficiency. All changed with 750ti and gtx9xx generation of cards. History of GPU programming is even longer than that as people were trying to bend even fixed pipeline GPUs to calculate something unrelated to graphics. Geforce 8 with early and limited CUDA was of course a game changer and I am a big fan of CUDA and OpenCL since then. Thanks for a great video on 7600XT! ❤

  • @aurimasc5333
    @aurimasc5333 6 дней назад

    Works just fine with RX5700xt, it does respond decently fast.

  • @BigFarm_ah365
    @BigFarm_ah365 5 месяцев назад +2

    Seeing as how I spent last night trying to install ROCm without any luck, nor could I find any good tutorials or a single success story, I'll be curious to see how insanely easy this is. Wait, I don't need to install and run ROCm in WSL?

    • @bankmanager
      @bankmanager 4 месяца назад +1

      Hey, I've had success with ROCm on 5.7/6.0/6.1.1 on Ubuntu and 5.7 on Windows so let me know if you're still having an issue and I can probably point you in the right direction

  • @Machistmo
    @Machistmo 5 дней назад

    I have a 6800XT, 6900XT and a 7900XT. I will attempt this on each.

  • @ols7462
    @ols7462 5 месяцев назад +4

    As a total dummy all things LLM your video was the catalyst I needed to entertain the idea of learning about all this AI stuff. I'm wondering and this would be a greatly appreciated video if you make it, is it possible to put this gpu to my streaming pc and it encodes and uploads stream and at the same time runs a local LLM that interacts with the chat on twitch. How can I integrate these models with my twitch streams?

  • @losttale1
    @losttale1 4 месяца назад +4

    gpu not detected on rx 6800 windows 10. edit: nvm must load model first from the top center.

    • @CapaUno1322
      @CapaUno1322 Месяц назад +1

      Good news! ;D

    • @nicholasfall838
      @nicholasfall838 Месяц назад +1

      What do you mean by “first from the top center?” I couldn’t get ROCm to recognize my CLU either, but that was through WSL 2 not this app

  • @fretbuzzly
    @fretbuzzly 3 месяца назад +2

    This is cool, but I have to say that I'm running Ollama with OpenWebUi and a 1080Ti and I get similarly quick responses. I would assume a newer card would perform much better, so I'm curious where the performance of the new cards really matters for just chatting, if at all.

    • @leucome
      @leucome 3 месяца назад +1

      If you add a voice generation then it matter a lot. With no voice anything over 10 token sec is pretty usable.

  • @CP-oo8mj
    @CP-oo8mj Месяц назад

    you couldn't load 30B parameter one because in your settings your trying to offload all layers to your GPU. Play with the setting and try reducing the GPU offload to find your sweet spot.

  • @crypto_que
    @crypto_que Месяц назад

    Today, I finally jumped off the AMD Struggle Bus, and installed an NVIDIA GPU that runs AI like boss. Instead of waiting SECONDS for two AMD GPUs to SHARE 8GB of memory via torch and pyenv and BIFURCATION software…
    My RTX 4070 Super just does the damn calculations right THE FIRST TIME!

  • @mysticalread
    @mysticalread 3 месяца назад +1

    Can you add multiple AMDs together increasing the power?

  • @robertmiller1638
    @robertmiller1638 2 месяца назад

    Incredible video!

  • @sailorbob74133
    @sailorbob74133 5 месяцев назад +2

    Can you do an update when ROCm 6.1 is integrated to LM Studio?

    • @bankmanager
      @bankmanager 4 месяца назад +1

      6.1 is not likely to ever be available on Windows. Need to wait for 6.2 at least.

    • @sailorbob74133
      @sailorbob74133 4 месяца назад

      @@bankmanager Ok, thanks for the reply.

  • @CapaUno1322
    @CapaUno1322 Месяц назад +1

    When you have a LLM on your machine, can it still access the internet for information? Just thinking aloud? Thanks, subbed! ;D

    • @deanmakovic3849
      @deanmakovic3849 Месяц назад +1

      Turn of internet and see what would happened :)

  • @Cjw9000
    @Cjw9000 5 месяцев назад +1

    Chat GPT 3.5 has about 170B parameters and I heard that Chat GPT 4 is a MoE with 8 times 120B parameters, so effectively 960B parameters that you would have to load into vram.

  • @jbnrusnya_should_be_punished
    @jbnrusnya_should_be_punished 4 месяца назад +1

    Is there rx-580 support, who knows for sure? (it's not on the list of ROCm that's why I'm asking) or at list does it work with RX6600M 'cause I see in compatible list only RX6600XT.

    • @predabot__6778
      @predabot__6778 Месяц назад +1

      The RX6600M is the same chip as the RX6600 (Navi23), just with a different vbios - and since Navi23XT (RX6600XT-6650XT) is simply the full die, without cutting, then it should work on the RX6600M - same chip, just a bit cut down.
      (not a bad bin though - it's a good bin, with a higher base-clock than desktop RX6600, even, but shaders cut on purpose, to improve efficiency. I.e, desktop RX6600's are failed bins of RX6600XT's, whom are then cut down to justify their existence - laptop RX6600M, are some of the best 6600XT's but cut on purpose to save power)

  • @casius00
    @casius00 3 месяца назад

    Great video. Worked for me on the first try. Is there a guide somewhere on how to limit/configure a model?

  • @LLlblKAPHO
    @LLlblKAPHO 3 дня назад

    How it work with laptops? We have 2 GPU, small and large and llama studio turn on small gpu(

  • @mclab33
    @mclab33 Месяц назад

    I'll try it in a few hours with the 780M iGPU and let you know

  • @adriwicaksono
    @adriwicaksono 5 месяцев назад

    07:34 Not sure if this will fix it but try unchecking the "GPU offload" box before loading the model, do tell us if it works!

  • @rdsii64
    @rdsii64 21 день назад

    Are any of these models that we can run locally uncensored/unrestricted?

  • @alvarodavidhernandezameson2480
    @alvarodavidhernandezameson2480 4 месяца назад

    I would like to see how it performs with a graphics card, rx 7600 standard version.

  • @dougf6126
    @dougf6126 5 месяцев назад +2

    AM I required to install AMD HIP SDK for Windows first before I can use LLM studio?

  • @WolfgangWeidner
    @WolfgangWeidner 3 месяца назад

    Amazing. The 7600(xt) is not even officially supported in AMDs ROCM software.

  • @ferluisch
    @ferluisch 2 месяца назад

    I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(

  • @barderino5673
    @barderino5673 6 месяцев назад +1

    wait 30bilion parameter model are fine with GGUF and 16gb even with 12 is something that im missing ?

  • @SBoth_
    @SBoth_ Месяц назад

    I can't set my 7900xtx to roc. Only options are Vulkan

  • @studiomusicflow4644
    @studiomusicflow4644 23 дня назад

    Does anyone know of a way to make an RX 580 run with ROCm on Windows? Yes, it's old, but it would be better than using the processor to play with A.I. and there are plenty of RX580s out there.

  • @djmccullough9233
    @djmccullough9233 6 месяцев назад

    I've been looking to make a dedicated AI machine with an LLM. i have a shelf bound 6800xt that has heat issues sustaining gaming, (have repasted, i think is partially defective) i didnt want to throw it away, Now i know i can repurpose it.

  • @jasonpierce4518
    @jasonpierce4518 3 месяца назад

    i see you downloading the windows exe for rocm lm studio.. how in the hell are you running that exe? i dont see you using a wine prompt.

  • @ThisIsMMI
    @ThisIsMMI 12 дней назад

    Can rx 570 8gb variant support ROCm?

  • @Beauty.and.FashionPhotographer
    @Beauty.and.FashionPhotographer 2 месяца назад

    can you teach how to do LIMs, Large Image Models ?

  • @ul6633
    @ul6633 3 месяца назад

    mine isnt using the GPU, it still uses the cpu. 6950xt

  • @dava00007
    @dava00007 4 месяца назад +1

    How is it doing with image generation?

  • @dogoku
    @dogoku 6 месяцев назад +1

    Can we use it to generate images as well (like mid journey or dall-e) or does it work only for text?

    • @Medeci
      @Medeci 5 месяцев назад

      yeah, on linux with SD

  • @sailorbob74133
    @sailorbob74133 5 месяцев назад

    Can I do anything useful on the phoenix NPU? Just bought a Phoenix laptop.

  • @edengate1
    @edengate1 3 месяца назад

    RX 7600 XT or RX 6750 XT for LLM ? On Windows.

  • @大支爺
    @大支爺 15 дней назад +1

    ROCm still very sux today.

  • @MsgForce
    @MsgForce 4 месяца назад

    I asked If it can generate a qr cod for me and it faild.

  • @CapaUno1322
    @CapaUno1322 Месяц назад +1

    Shouldn't you be at the Olympics? Maybe you are! 😅

  • @eliann124
    @eliann124 5 месяцев назад

    Nice Miniled !

  • @ferluisch
    @ferluisch 2 месяца назад

    Can you do a comparison vs cuda?

  • @ystrem7446
    @ystrem7446 5 месяцев назад +2

    Hi, does it work on RX5500 series ?

    • @predabot__6778
      @predabot__6778 Месяц назад

      Alas... since this uses ROCm, and AMD does not list *any* RDNA1 cards, then the answer is almost certainly... no. You really wouldn't even want to try it though, since the RX5500 XT is a severely gimped card (not to mention the horror of the non-xt OEM-variant) - it has only 1408 shader-cores, compared to the next jump up: the RX 5600 XT's 2304 cores - that's almost a 50% cut in compute! And it has a measly 4 GB's of VRAM... that's complete murder for LLM-usage - everything will be slow as molasses. You'll lose more time and money in trying to run the model (even if it was supported), than if you just got an RX6600 - that card is the best value *still* on this market, so if you want a cheap entry-level card to try this out, I would recommend that.

  • @Ittorri
    @Ittorri 5 месяцев назад

    So I asked the ai what it recommends if I want to upgrade my pc and it recommended RX 8000 XT💀

  • @StephenConnolly67
    @StephenConnolly67 3 месяца назад

    Would this work with Ollama?

  • @ssmd7449
    @ssmd7449 4 месяца назад

    How do I install ROCm software ? I’m at the website but when I download it, all it does it delete my adrenaline drivers…. Do I need the pro software to run ROCm? I still wanna game on my pc too

    • @bankmanager
      @bankmanager 4 месяца назад

      No, you don't need the pro drivers.

    • @ssmd7449
      @ssmd7449 4 месяца назад

      @@bankmanager how can I install ROCm ?

  • @dead_protagonist
    @dead_protagonist 5 месяцев назад +28

    you should mention that ROCm only supports... three... AMD gpu's

    • @user-hq9fp8sm8f
      @user-hq9fp8sm8f 5 месяцев назад +5

      More than 3

    • @dead_protagonist
      @dead_protagonist 5 месяцев назад

      @@user-hq9fp8sm8f source

    • @arg0x-
      @arg0x- 5 месяцев назад +3

      @@user-hq9fp8sm8f does it support RX 5600 XT?

    • @user-hq9fp8sm8f
      @user-hq9fp8sm8f 5 месяцев назад

      @@arg0x- no

    • @highpraise-highcritic
      @highpraise-highcritic 4 месяца назад +16

      @dead_protagonist .. you should mention you don't know what you are talking about and/or didn't read the compatibility supported/unsupported gpu list ...
      or ... maybe you just can't count ¯\_(ツ)_/¯

  • @seeibe
    @seeibe 2 месяца назад

    Clickbait title I suppose? Just because you can run local LLMs, doesn't mean the GPU plays in the same league as nvidia consumer GPUs (4090)

  • @marconwps
    @marconwps 3 месяца назад

    Zluda :)

  • @stonedoubt
    @stonedoubt 5 месяцев назад

    How do you get the AMD out of your throat? Just wondering since I’ve never seen anyone gobble so hard…

  • @newdawn005
    @newdawn005 6 месяцев назад +2

    sad that there is only advertising here, an amd gpu is bad - where is the video about the problems of an amd gpu ?

    • @bobdickweed
      @bobdickweed 6 месяцев назад

      i have had AMD GPUs for the past 14 years . never a problem , im on the 7900xtx now and i works great for what i do

    • @Unclesam404
      @Unclesam404 6 месяцев назад +9

      Amd is improving software in lighting speed. So what are you smoking ? Why Amd gpu can not do GPGPU with good software ?

    • @dansanger5340
      @dansanger5340 4 месяца назад +1

      Not everyone can afford a 4090 GPU. AMD seems like a better value, at the cost of a little extra effort.