How to Turn Your AMD GPU into a Local LLM Beast: A Beginner's Guide with ROCm

Поделиться
HTML-код
  • Опубликовано: 16 ноя 2024

Комментарии • 142

  • @misterpdj
    @misterpdj 7 месяцев назад +31

    Thanks for letting us know about this new release. Just tried it on my 6800xt, and it works. FYI, I think the supported list is all Navi 21 cards and all RDNA 3. That's the same list as the HIP SDK supported cards on the AMD ROCm Windows System Requirements page.

    • @JoseRoberto-wr1bv
      @JoseRoberto-wr1bv 7 месяцев назад +3

      How much Token/s??? Using a 7B model??

    • @chaz-e
      @chaz-e 7 месяцев назад

      And 7600XT is not a part of the official supported list.

    • @misterpdj
      @misterpdj 6 месяцев назад +3

      @@JoseRoberto-wr1bv On the Q8_0 version of Llama 3 I was getting 80 t/s, but for a couple of reasons the quality wasn't so good. I'm using Mixtral Instruct as my daily driver, and getting 14-18 depending on how I balance offload vs context size.

    • @misterpdj
      @misterpdj 6 месяцев назад +1

      @@chaz-e that and the 7600 are both gfx1102.

  • @ThePawel36
    @ThePawel36 7 месяцев назад +19

    I've successfully utilized 70B models with 4-bit quantization on my 4070ti Super. I offload 27 out of 80 layers partially, while the remainder utilizes the RAM. It functions quite well-not exceedingly fast, but sufficiently for comfortable operation. A minimum of 64GB of RAM is required. While VRAM is significant, in reality, you can operate 70B networks with even 10GB of VRAM or less. It ultimately depends on the model's response time to your queries.

    • @gomesbruno201
      @gomesbruno201 5 месяцев назад +3

      it would be nice to try on a amd equivalent. maybe 7800xt or 7900xt

    • @ferluisch
      @ferluisch 4 месяца назад +1

      Maybe he can say his tok/s, comparing my 2080 vs the video (rx 7600) I get this results: I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(

    • @VamosViverFora
      @VamosViverFora 26 дней назад

      Nice to know. I thought it only use VRAM xor RAM. Good to know it add up all memory available.

    • @oggenable
      @oggenable 17 дней назад

      @@gomesbruno201 7900 XTX is around the same performance as the 4070 Ti Super.

  • @cj_zak1681
    @cj_zak1681 7 месяцев назад +6

    brilliant! Thanks for letting us know, I am excited to try this

  • @pedromartins4847
    @pedromartins4847 7 месяцев назад +3

    Will be trying this out later on, thank you my man.

  • @sebidev
    @sebidev 2 месяца назад +7

    Thanks you for the video, I can now use 8B large LLM models with my AMD RX 7600(8GB) and it is really fast. I use Arch Linux and it runs without any problems 👍

    • @puffin11
      @puffin11 2 месяца назад

      How did you get it to work on Linux? I've been having issues (and Ollama seems to recommend the proprietary AMD drivers....)

    • @sebidev
      @sebidev 2 месяца назад

      @@puffin11 not install amd pro drivers(proprietary). amdgpu is completely sufficient with rocm.

    • @whale2186
      @whale2186 Месяц назад

      I was confused about buying an rtx 3060 over rx 7600 I thought ROCm was not supported on this card. How is the image generation and model training ?

    • @sebidev
      @sebidev Месяц назад

      @@whale2186 If you work a lot with AI models, projects, an Nvidia RTX graphics card is the best choice. AMD ROCm support is okay but unfortunately not nearly as good as the support from Nvidia CUDA and cuDNN.

    • @whale2186
      @whale2186 Месяц назад

      @@sebidev thank you . I think I should go with 3060 or 4060 with GPU passthrough

  • @Dj-Mccullough
    @Dj-Mccullough 7 месяцев назад +11

    It works awesome on the 6800xt. Thankyou for the guide.

    • @agx4035
      @agx4035 6 месяцев назад +2

      is it as fast in the video ?

    • @bankmanager
      @bankmanager 6 месяцев назад +6

      ​@@agx4035the video accurately shows expected performance, yes.

    • @CapaUno1322
      @CapaUno1322 3 месяца назад +1

      Just picked up a 16gb 6800, can't wait to get it installed and see what this baby can do! ;D

    • @Helldiver111
      @Helldiver111 2 месяца назад

      Update ?​@@CapaUno1322

  • @aurimasc5333
    @aurimasc5333 2 месяца назад +2

    Works just fine with RX5700xt, it does respond decently fast.

  • @MiguelGonzalez-nv2rt
    @MiguelGonzalez-nv2rt 7 месяцев назад +7

    Its not working for me, I have a 7900xt installed and attempted the same as you but it just gets an error message with no apparent reason. Drivers up to date and everything in order but nothing

  • @oggenable
    @oggenable 17 дней назад

    If you already have this GPU, go ahead and play with LLM's. It's a good place to get started. I started playing with a Vega 56 GPU which is rock bottom of what ROCm supports for LLM's if I understand things correctly. If LLM's is the focus and you are buying new nVidia is still the better option. An RTX 3060 w. 12GB of VRAM gives you 20% more tokens/s at 20% less price. I sometimes see used RTX 3080's at the same price point as the RX 7600 XT. You don't need all that VRAM if you don't have the compute power to back it.

  • @jakeastles
    @jakeastles 4 месяца назад +3

    Thanks, the only good video I could find on yt which explained everything easily. Your accent helped me focus. Very useful stuff.

    • @TechteamGB
      @TechteamGB  4 месяца назад

      Thank you! Glad to be helpful :D

  • @joshuat6124
    @joshuat6124 7 месяцев назад +1

    Amazing video, I learnt a lot! I love these videos about commerical GPUs running AI/ML workloads as I'm into developing AL/ML models.

  • @myroslav6873
    @myroslav6873 3 месяца назад +2

    Thanks, worked for me very well on my 6800xt! The answers are as quick as in the video. But I guess I need to learn how and what to ask, because the answers were always very confident and always completely wrong and made-up. I asked the chat to make a list of French kings who were married off before they were 18 yo, and it invented a bunch of Kings that never lived, and said that Emperor Napoleon Bonaparte and President Macron were both married off at 16, but they were not kings technically, and they were certainly not married at 16, lol.

  • @VasilijP
    @VasilijP 3 месяца назад +1

    Well, it is not like GPGPU came just with LLMs. OpenCL on AMD GPUs in 2013 and before was the most viable option for crypto mining, while Nvidia was too slow at that time due to small cache size and poor efficiency. All changed with 750ti and gtx9xx generation of cards. History of GPU programming is even longer than that as people were trying to bend even fixed pipeline GPUs to calculate something unrelated to graphics. Geforce 8 with early and limited CUDA was of course a game changer and I am a big fan of CUDA and OpenCL since then. Thanks for a great video on 7600XT! ❤

  • @mysticalread
    @mysticalread 4 месяца назад +3

    Can you add multiple AMDs together increasing the power?

  • @fretbuzzly
    @fretbuzzly 5 месяцев назад +2

    This is cool, but I have to say that I'm running Ollama with OpenWebUi and a 1080Ti and I get similarly quick responses. I would assume a newer card would perform much better, so I'm curious where the performance of the new cards really matters for just chatting, if at all.

    • @leucome
      @leucome 5 месяцев назад +1

      If you add a voice generation then it matter a lot. With no voice anything over 10 token sec is pretty usable.

  • @clementajaegbu6660
    @clementajaegbu6660 Месяц назад

    Good vid; however the amd rocm versions of relevant files are no longer available (Link in description leads to generic lm studio versions) )? The later versions don't appear to specifically recognize AMD GPU's ?

  • @studiomusicflow4644
    @studiomusicflow4644 2 месяца назад +2

    Does anyone know of a way to make an RX 580 run with ROCm on Windows? Yes, it's old, but it would be better than using the processor to play with A.I. and there are plenty of RX580s out there.

  • @Machistmo
    @Machistmo 2 месяца назад

    I have a 6800XT, 6900XT and a 7900XT. I will attempt this on each.

  • @crypto_que
    @crypto_que 3 месяца назад

    Today, I finally jumped off the AMD Struggle Bus, and installed an NVIDIA GPU that runs AI like boss. Instead of waiting SECONDS for two AMD GPUs to SHARE 8GB of memory via torch and pyenv and BIFURCATION software…
    My RTX 4070 Super just does the damn calculations right THE FIRST TIME!

  • @CapaUno1322
    @CapaUno1322 3 месяца назад +1

    When you have a LLM on your machine, can it still access the internet for information? Just thinking aloud? Thanks, subbed! ;D

    • @deanmakovic3849
      @deanmakovic3849 2 месяца назад +1

      Turn of internet and see what would happened :)

  • @ThisIsMMI
    @ThisIsMMI 2 месяца назад +1

    Can rx 570 8gb variant support ROCm?

  • @RobertoMaurizzi
    @RobertoMaurizzi Месяц назад

    If there's a Windows driver for ROCm how come PyTorch still only show ROCm available for Linux?
    Anyway good to know they works, I'd like to buy a new system dedicated to LLM/Diffusion tasks and your is the first confirmation it actually works as intended 😅

  • @barderino5673
    @barderino5673 7 месяцев назад +1

    wait 30bilion parameter model are fine with GGUF and 16gb even with 12 is something that im missing ?

  • @robertmiller1638
    @robertmiller1638 3 месяца назад

    Incredible video!

  • @losttale1
    @losttale1 6 месяцев назад +4

    gpu not detected on rx 6800 windows 10. edit: nvm must load model first from the top center.

    • @CapaUno1322
      @CapaUno1322 3 месяца назад +1

      Good news! ;D

    • @nicholasfall838
      @nicholasfall838 3 месяца назад +1

      What do you mean by “first from the top center?” I couldn’t get ROCm to recognize my CLU either, but that was through WSL 2 not this app

  • @ols7462
    @ols7462 7 месяцев назад +4

    As a total dummy all things LLM your video was the catalyst I needed to entertain the idea of learning about all this AI stuff. I'm wondering and this would be a greatly appreciated video if you make it, is it possible to put this gpu to my streaming pc and it encodes and uploads stream and at the same time runs a local LLM that interacts with the chat on twitch. How can I integrate these models with my twitch streams?

  • @Dj-Mccullough
    @Dj-Mccullough 7 месяцев назад +1

    I've been looking to make a dedicated AI machine with an LLM. i have a shelf bound 6800xt that has heat issues sustaining gaming, (have repasted, i think is partially defective) i didnt want to throw it away, Now i know i can repurpose it.

  • @dougf6126
    @dougf6126 7 месяцев назад +2

    AM I required to install AMD HIP SDK for Windows first before I can use LLM studio?

  • @casius00
    @casius00 5 месяцев назад

    Great video. Worked for me on the first try. Is there a guide somewhere on how to limit/configure a model?

  • @BigFarm_ah365
    @BigFarm_ah365 6 месяцев назад +2

    Seeing as how I spent last night trying to install ROCm without any luck, nor could I find any good tutorials or a single success story, I'll be curious to see how insanely easy this is. Wait, I don't need to install and run ROCm in WSL?

    • @bankmanager
      @bankmanager 6 месяцев назад +1

      Hey, I've had success with ROCm on 5.7/6.0/6.1.1 on Ubuntu and 5.7 on Windows so let me know if you're still having an issue and I can probably point you in the right direction

  • @CodeCube-rv1rm
    @CodeCube-rv1rm 27 дней назад +1

    "If you've used an AMD GPU for compute work, you'll know that's not great"
    Bruh that Pugetbench score shows the RX 7900 XTX getting 92.6% of the RTX 4090's performance and it has the same amount of VRAM for at least £700 less. 💀💀

  • @mclab33
    @mclab33 3 месяца назад +1

    I'll try it in a few hours with the 780M iGPU and let you know

    • @mclab33
      @mclab33 Месяц назад +1

      Not working!

  • @CP-oo8mj
    @CP-oo8mj 3 месяца назад

    you couldn't load 30B parameter one because in your settings your trying to offload all layers to your GPU. Play with the setting and try reducing the GPU offload to find your sweet spot.

  • @jaiderariza1292
    @jaiderariza1292 15 дней назад

    will be good if also create a video for open-webui + AMD

  • @sailorbob74133
    @sailorbob74133 7 месяцев назад +2

    Can you do an update when ROCm 6.1 is integrated to LM Studio?

    • @bankmanager
      @bankmanager 6 месяцев назад +1

      6.1 is not likely to ever be available on Windows. Need to wait for 6.2 at least.

    • @sailorbob74133
      @sailorbob74133 6 месяцев назад

      @@bankmanager Ok, thanks for the reply.

  • @duality4y
    @duality4y Месяц назад

    what about multiple 7600 cards

  • @ИльяАникин
    @ИльяАникин 9 дней назад

    please someone tell me how to make this 7600xt work normally with stable diffusion

  • @alvarodavidhernandezameson2480
    @alvarodavidhernandezameson2480 6 месяцев назад

    I would like to see how it performs with a graphics card, rx 7600 standard version.

  • @LLlblKAPHO
    @LLlblKAPHO 2 месяца назад

    How it work with laptops? We have 2 GPU, small and large and llama studio turn on small gpu(

  • @eliann124
    @eliann124 7 месяцев назад

    Nice Miniled !

  • @adriwicaksono
    @adriwicaksono 7 месяцев назад

    07:34 Not sure if this will fix it but try unchecking the "GPU offload" box before loading the model, do tell us if it works!

  • @ystrem7446
    @ystrem7446 7 месяцев назад +2

    Hi, does it work on RX5500 series ?

    • @predabot__6778
      @predabot__6778 3 месяца назад

      Alas... since this uses ROCm, and AMD does not list *any* RDNA1 cards, then the answer is almost certainly... no. You really wouldn't even want to try it though, since the RX5500 XT is a severely gimped card (not to mention the horror of the non-xt OEM-variant) - it has only 1408 shader-cores, compared to the next jump up: the RX 5600 XT's 2304 cores - that's almost a 50% cut in compute! And it has a measly 4 GB's of VRAM... that's complete murder for LLM-usage - everything will be slow as molasses. You'll lose more time and money in trying to run the model (even if it was supported), than if you just got an RX6600 - that card is the best value *still* on this market, so if you want a cheap entry-level card to try this out, I would recommend that.

  • @dogoku
    @dogoku 7 месяцев назад +1

    Can we use it to generate images as well (like mid journey or dall-e) or does it work only for text?

    • @Medeci
      @Medeci 7 месяцев назад

      yeah, on linux with SD

  • @edengate1
    @edengate1 5 месяцев назад

    RX 7600 XT or RX 6750 XT for LLM ? On Windows.

  • @pihva_rusni
    @pihva_rusni 6 месяцев назад +1

    Is there rx-580 support, who knows for sure? (it's not on the list of ROCm that's why I'm asking) or at list does it work with RX6600M 'cause I see in compatible list only RX6600XT.

    • @predabot__6778
      @predabot__6778 3 месяца назад +1

      The RX6600M is the same chip as the RX6600 (Navi23), just with a different vbios - and since Navi23XT (RX6600XT-6650XT) is simply the full die, without cutting, then it should work on the RX6600M - same chip, just a bit cut down.
      (not a bad bin though - it's a good bin, with a higher base-clock than desktop RX6600, even, but shaders cut on purpose, to improve efficiency. I.e, desktop RX6600's are failed bins of RX6600XT's, whom are then cut down to justify their existence - laptop RX6600M, are some of the best 6600XT's but cut on purpose to save power)

  • @dava00007
    @dava00007 5 месяцев назад +1

    How is it doing with image generation?

  • @HypoCT
    @HypoCT Месяц назад

    Can you try Ollama with the this rocm thing? I've been splitting my head trying to get it to work with 6800xt

    • @ManjaroBlack
      @ManjaroBlack Месяц назад

      Ollama doesn’t work with ROCm. It is for nvidia and Apple silicon only.

  • @rdsii64
    @rdsii64 2 месяца назад

    Are any of these models that we can run locally uncensored/unrestricted?

  • @Beauty.and.FashionPhotographer
    @Beauty.and.FashionPhotographer 4 месяца назад

    can you teach how to do LIMs, Large Image Models ?

  • @ferluisch
    @ferluisch 4 месяца назад

    I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(

  • @ferluisch
    @ferluisch 4 месяца назад

    Can you do a comparison vs cuda?

  • @safayatjamil2719
    @safayatjamil2719 Месяц назад

    ZLUDA is available again btw

  • @IntenseGrid
    @IntenseGrid 17 дней назад

    I'd like you to try out an 8700G with fast ram to run LLMs. Also please run Linux.

  • @sailorbob74133
    @sailorbob74133 7 месяцев назад

    Can I do anything useful on the phoenix NPU? Just bought a Phoenix laptop.

  • @StephenConnolly67
    @StephenConnolly67 5 месяцев назад

    Would this work with Ollama?

  • @ssmd7449
    @ssmd7449 6 месяцев назад

    How do I install ROCm software ? I’m at the website but when I download it, all it does it delete my adrenaline drivers…. Do I need the pro software to run ROCm? I still wanna game on my pc too

    • @bankmanager
      @bankmanager 6 месяцев назад

      No, you don't need the pro drivers.

    • @ssmd7449
      @ssmd7449 6 месяцев назад

      @@bankmanager how can I install ROCm ?

  • @ul6633
    @ul6633 5 месяцев назад

    mine isnt using the GPU, it still uses the cpu. 6950xt

  • @MsgForce
    @MsgForce 6 месяцев назад

    I asked If it can generate a qr cod for me and it faild.

  • @WolfgangWeidner
    @WolfgangWeidner 5 месяцев назад

    Amazing. The 7600(xt) is not even officially supported in AMDs ROCM software.

  • @Cjw9000
    @Cjw9000 7 месяцев назад +1

    Chat GPT 3.5 has about 170B parameters and I heard that Chat GPT 4 is a MoE with 8 times 120B parameters, so effectively 960B parameters that you would have to load into vram.

  • @CapaUno1322
    @CapaUno1322 3 месяца назад +1

    Shouldn't you be at the Olympics? Maybe you are! 😅

  • @Ittorri
    @Ittorri 7 месяцев назад

    So I asked the ai what it recommends if I want to upgrade my pc and it recommended RX 8000 XT💀

  • @Larimuss
    @Larimuss 4 дня назад +1

    Let me know when amd can run diffusi0n models quicker than CPUs 😢

  • @SBoth_
    @SBoth_ 3 месяца назад

    I can't set my 7900xtx to roc. Only options are Vulkan

  • @ZachariasDaisy-b6d
    @ZachariasDaisy-b6d Месяц назад

    Hintz Summit

  • @dead_protagonist
    @dead_protagonist 7 месяцев назад +35

    you should mention that ROCm only supports... three... AMD gpu's

    • @user-hq9fp8sm8f
      @user-hq9fp8sm8f 7 месяцев назад +9

      More than 3

    • @dead_protagonist
      @dead_protagonist 7 месяцев назад

      @@user-hq9fp8sm8f source

    • @arg0x-
      @arg0x- 7 месяцев назад +3

      @@user-hq9fp8sm8f does it support RX 5600 XT?

    • @user-hq9fp8sm8f
      @user-hq9fp8sm8f 7 месяцев назад

      @@arg0x- no

    • @highpraise-highcritic
      @highpraise-highcritic 6 месяцев назад +21

      @dead_protagonist .. you should mention you don't know what you are talking about and/or didn't read the compatibility supported/unsupported gpu list ...
      or ... maybe you just can't count ¯\_(ツ)_/¯

  • @jacobtinkle9686
    @jacobtinkle9686 Месяц назад

    calling a 350-370€ grafics card "budget" is kinda weird ngl

  • @duality4y
    @duality4y Месяц назад

    How on earth can these cards be cheaper then NVIDIA I think I'll never buy NVIDIA again ...

  • @seeibe
    @seeibe 3 месяца назад

    Clickbait title I suppose? Just because you can run local LLMs, doesn't mean the GPU plays in the same league as nvidia consumer GPUs (4090)

  • @大支爺
    @大支爺 2 месяца назад +1

    ROCm still very sux today.

  • @marconwps
    @marconwps 5 месяцев назад

    Zluda :)

  • @ИльяАникин
    @ИльяАникин 10 дней назад

    NEVER BUY AMD

  • @stonedoubt
    @stonedoubt 6 месяцев назад

    How do you get the AMD out of your throat? Just wondering since I’ve never seen anyone gobble so hard…

  • @newdawn005
    @newdawn005 7 месяцев назад +2

    sad that there is only advertising here, an amd gpu is bad - where is the video about the problems of an amd gpu ?

    • @bobdickweed
      @bobdickweed 7 месяцев назад +2

      i have had AMD GPUs for the past 14 years . never a problem , im on the 7900xtx now and i works great for what i do

    • @Unclesam404
      @Unclesam404 7 месяцев назад +11

      Amd is improving software in lighting speed. So what are you smoking ? Why Amd gpu can not do GPGPU with good software ?

    • @dansanger5340
      @dansanger5340 6 месяцев назад +2

      Not everyone can afford a 4090 GPU. AMD seems like a better value, at the cost of a little extra effort.

  • @DandyDude
    @DandyDude 22 дня назад

    Got anything as good for image generation?