Thanks for letting us know about this new release. Just tried it on my 6800xt, and it works. FYI, I think the supported list is all Navi 21 cards and all RDNA 3. That's the same list as the HIP SDK supported cards on the AMD ROCm Windows System Requirements page.
@@JoseRoberto-wr1bv On the Q8_0 version of Llama 3 I was getting 80 t/s, but for a couple of reasons the quality wasn't so good. I'm using Mixtral Instruct as my daily driver, and getting 14-18 depending on how I balance offload vs context size.
I've successfully utilized 70B models with 4-bit quantization on my 4070ti Super. I offload 27 out of 80 layers partially, while the remainder utilizes the RAM. It functions quite well-not exceedingly fast, but sufficiently for comfortable operation. A minimum of 64GB of RAM is required. While VRAM is significant, in reality, you can operate 70B networks with even 10GB of VRAM or less. It ultimately depends on the model's response time to your queries.
Maybe he can say his tok/s, comparing my 2080 vs the video (rx 7600) I get this results: I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Thanks you for the video, I can now use 8B large LLM models with my AMD RX 7600(8GB) and it is really fast. I use Arch Linux and it runs without any problems 👍
@@whale2186 If you work a lot with AI models, projects, an Nvidia RTX graphics card is the best choice. AMD ROCm support is okay but unfortunately not nearly as good as the support from Nvidia CUDA and cuDNN.
Its not working for me, I have a 7900xt installed and attempted the same as you but it just gets an error message with no apparent reason. Drivers up to date and everything in order but nothing
If you already have this GPU, go ahead and play with LLM's. It's a good place to get started. I started playing with a Vega 56 GPU which is rock bottom of what ROCm supports for LLM's if I understand things correctly. If LLM's is the focus and you are buying new nVidia is still the better option. An RTX 3060 w. 12GB of VRAM gives you 20% more tokens/s at 20% less price. I sometimes see used RTX 3080's at the same price point as the RX 7600 XT. You don't need all that VRAM if you don't have the compute power to back it.
Thanks, worked for me very well on my 6800xt! The answers are as quick as in the video. But I guess I need to learn how and what to ask, because the answers were always very confident and always completely wrong and made-up. I asked the chat to make a list of French kings who were married off before they were 18 yo, and it invented a bunch of Kings that never lived, and said that Emperor Napoleon Bonaparte and President Macron were both married off at 16, but they were not kings technically, and they were certainly not married at 16, lol.
Well, it is not like GPGPU came just with LLMs. OpenCL on AMD GPUs in 2013 and before was the most viable option for crypto mining, while Nvidia was too slow at that time due to small cache size and poor efficiency. All changed with 750ti and gtx9xx generation of cards. History of GPU programming is even longer than that as people were trying to bend even fixed pipeline GPUs to calculate something unrelated to graphics. Geforce 8 with early and limited CUDA was of course a game changer and I am a big fan of CUDA and OpenCL since then. Thanks for a great video on 7600XT! ❤
This is cool, but I have to say that I'm running Ollama with OpenWebUi and a 1080Ti and I get similarly quick responses. I would assume a newer card would perform much better, so I'm curious where the performance of the new cards really matters for just chatting, if at all.
Good vid; however the amd rocm versions of relevant files are no longer available (Link in description leads to generic lm studio versions) )? The later versions don't appear to specifically recognize AMD GPU's ?
Does anyone know of a way to make an RX 580 run with ROCm on Windows? Yes, it's old, but it would be better than using the processor to play with A.I. and there are plenty of RX580s out there.
Today, I finally jumped off the AMD Struggle Bus, and installed an NVIDIA GPU that runs AI like boss. Instead of waiting SECONDS for two AMD GPUs to SHARE 8GB of memory via torch and pyenv and BIFURCATION software… My RTX 4070 Super just does the damn calculations right THE FIRST TIME!
If there's a Windows driver for ROCm how come PyTorch still only show ROCm available for Linux? Anyway good to know they works, I'd like to buy a new system dedicated to LLM/Diffusion tasks and your is the first confirmation it actually works as intended 😅
As a total dummy all things LLM your video was the catalyst I needed to entertain the idea of learning about all this AI stuff. I'm wondering and this would be a greatly appreciated video if you make it, is it possible to put this gpu to my streaming pc and it encodes and uploads stream and at the same time runs a local LLM that interacts with the chat on twitch. How can I integrate these models with my twitch streams?
I've been looking to make a dedicated AI machine with an LLM. i have a shelf bound 6800xt that has heat issues sustaining gaming, (have repasted, i think is partially defective) i didnt want to throw it away, Now i know i can repurpose it.
Seeing as how I spent last night trying to install ROCm without any luck, nor could I find any good tutorials or a single success story, I'll be curious to see how insanely easy this is. Wait, I don't need to install and run ROCm in WSL?
Hey, I've had success with ROCm on 5.7/6.0/6.1.1 on Ubuntu and 5.7 on Windows so let me know if you're still having an issue and I can probably point you in the right direction
"If you've used an AMD GPU for compute work, you'll know that's not great" Bruh that Pugetbench score shows the RX 7900 XTX getting 92.6% of the RTX 4090's performance and it has the same amount of VRAM for at least £700 less. 💀💀
you couldn't load 30B parameter one because in your settings your trying to offload all layers to your GPU. Play with the setting and try reducing the GPU offload to find your sweet spot.
Alas... since this uses ROCm, and AMD does not list *any* RDNA1 cards, then the answer is almost certainly... no. You really wouldn't even want to try it though, since the RX5500 XT is a severely gimped card (not to mention the horror of the non-xt OEM-variant) - it has only 1408 shader-cores, compared to the next jump up: the RX 5600 XT's 2304 cores - that's almost a 50% cut in compute! And it has a measly 4 GB's of VRAM... that's complete murder for LLM-usage - everything will be slow as molasses. You'll lose more time and money in trying to run the model (even if it was supported), than if you just got an RX6600 - that card is the best value *still* on this market, so if you want a cheap entry-level card to try this out, I would recommend that.
Is there rx-580 support, who knows for sure? (it's not on the list of ROCm that's why I'm asking) or at list does it work with RX6600M 'cause I see in compatible list only RX6600XT.
The RX6600M is the same chip as the RX6600 (Navi23), just with a different vbios - and since Navi23XT (RX6600XT-6650XT) is simply the full die, without cutting, then it should work on the RX6600M - same chip, just a bit cut down. (not a bad bin though - it's a good bin, with a higher base-clock than desktop RX6600, even, but shaders cut on purpose, to improve efficiency. I.e, desktop RX6600's are failed bins of RX6600XT's, whom are then cut down to justify their existence - laptop RX6600M, are some of the best 6600XT's but cut on purpose to save power)
I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
How do I install ROCm software ? I’m at the website but when I download it, all it does it delete my adrenaline drivers…. Do I need the pro software to run ROCm? I still wanna game on my pc too
Chat GPT 3.5 has about 170B parameters and I heard that Chat GPT 4 is a MoE with 8 times 120B parameters, so effectively 960B parameters that you would have to load into vram.
@dead_protagonist .. you should mention you don't know what you are talking about and/or didn't read the compatibility supported/unsupported gpu list ... or ... maybe you just can't count ¯\_(ツ)_/¯
Thanks for letting us know about this new release. Just tried it on my 6800xt, and it works. FYI, I think the supported list is all Navi 21 cards and all RDNA 3. That's the same list as the HIP SDK supported cards on the AMD ROCm Windows System Requirements page.
How much Token/s??? Using a 7B model??
And 7600XT is not a part of the official supported list.
@@JoseRoberto-wr1bv On the Q8_0 version of Llama 3 I was getting 80 t/s, but for a couple of reasons the quality wasn't so good. I'm using Mixtral Instruct as my daily driver, and getting 14-18 depending on how I balance offload vs context size.
@@chaz-e that and the 7600 are both gfx1102.
I've successfully utilized 70B models with 4-bit quantization on my 4070ti Super. I offload 27 out of 80 layers partially, while the remainder utilizes the RAM. It functions quite well-not exceedingly fast, but sufficiently for comfortable operation. A minimum of 64GB of RAM is required. While VRAM is significant, in reality, you can operate 70B networks with even 10GB of VRAM or less. It ultimately depends on the model's response time to your queries.
it would be nice to try on a amd equivalent. maybe 7800xt or 7900xt
Maybe he can say his tok/s, comparing my 2080 vs the video (rx 7600) I get this results: I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Nice to know. I thought it only use VRAM xor RAM. Good to know it add up all memory available.
@@gomesbruno201 7900 XTX is around the same performance as the 4070 Ti Super.
brilliant! Thanks for letting us know, I am excited to try this
Will be trying this out later on, thank you my man.
Thanks you for the video, I can now use 8B large LLM models with my AMD RX 7600(8GB) and it is really fast. I use Arch Linux and it runs without any problems 👍
How did you get it to work on Linux? I've been having issues (and Ollama seems to recommend the proprietary AMD drivers....)
@@puffin11 not install amd pro drivers(proprietary). amdgpu is completely sufficient with rocm.
I was confused about buying an rtx 3060 over rx 7600 I thought ROCm was not supported on this card. How is the image generation and model training ?
@@whale2186 If you work a lot with AI models, projects, an Nvidia RTX graphics card is the best choice. AMD ROCm support is okay but unfortunately not nearly as good as the support from Nvidia CUDA and cuDNN.
@@sebidev thank you . I think I should go with 3060 or 4060 with GPU passthrough
It works awesome on the 6800xt. Thankyou for the guide.
is it as fast in the video ?
@@agx4035the video accurately shows expected performance, yes.
Just picked up a 16gb 6800, can't wait to get it installed and see what this baby can do! ;D
Update ?@@CapaUno1322
Works just fine with RX5700xt, it does respond decently fast.
Its not working for me, I have a 7900xt installed and attempted the same as you but it just gets an error message with no apparent reason. Drivers up to date and everything in order but nothing
If you already have this GPU, go ahead and play with LLM's. It's a good place to get started. I started playing with a Vega 56 GPU which is rock bottom of what ROCm supports for LLM's if I understand things correctly. If LLM's is the focus and you are buying new nVidia is still the better option. An RTX 3060 w. 12GB of VRAM gives you 20% more tokens/s at 20% less price. I sometimes see used RTX 3080's at the same price point as the RX 7600 XT. You don't need all that VRAM if you don't have the compute power to back it.
Thanks, the only good video I could find on yt which explained everything easily. Your accent helped me focus. Very useful stuff.
Thank you! Glad to be helpful :D
Amazing video, I learnt a lot! I love these videos about commerical GPUs running AI/ML workloads as I'm into developing AL/ML models.
Thanks, worked for me very well on my 6800xt! The answers are as quick as in the video. But I guess I need to learn how and what to ask, because the answers were always very confident and always completely wrong and made-up. I asked the chat to make a list of French kings who were married off before they were 18 yo, and it invented a bunch of Kings that never lived, and said that Emperor Napoleon Bonaparte and President Macron were both married off at 16, but they were not kings technically, and they were certainly not married at 16, lol.
Well, it is not like GPGPU came just with LLMs. OpenCL on AMD GPUs in 2013 and before was the most viable option for crypto mining, while Nvidia was too slow at that time due to small cache size and poor efficiency. All changed with 750ti and gtx9xx generation of cards. History of GPU programming is even longer than that as people were trying to bend even fixed pipeline GPUs to calculate something unrelated to graphics. Geforce 8 with early and limited CUDA was of course a game changer and I am a big fan of CUDA and OpenCL since then. Thanks for a great video on 7600XT! ❤
Can you add multiple AMDs together increasing the power?
This is cool, but I have to say that I'm running Ollama with OpenWebUi and a 1080Ti and I get similarly quick responses. I would assume a newer card would perform much better, so I'm curious where the performance of the new cards really matters for just chatting, if at all.
If you add a voice generation then it matter a lot. With no voice anything over 10 token sec is pretty usable.
Good vid; however the amd rocm versions of relevant files are no longer available (Link in description leads to generic lm studio versions) )? The later versions don't appear to specifically recognize AMD GPU's ?
Does anyone know of a way to make an RX 580 run with ROCm on Windows? Yes, it's old, but it would be better than using the processor to play with A.I. and there are plenty of RX580s out there.
I have a 6800XT, 6900XT and a 7900XT. I will attempt this on each.
Today, I finally jumped off the AMD Struggle Bus, and installed an NVIDIA GPU that runs AI like boss. Instead of waiting SECONDS for two AMD GPUs to SHARE 8GB of memory via torch and pyenv and BIFURCATION software…
My RTX 4070 Super just does the damn calculations right THE FIRST TIME!
When you have a LLM on your machine, can it still access the internet for information? Just thinking aloud? Thanks, subbed! ;D
Turn of internet and see what would happened :)
Can rx 570 8gb variant support ROCm?
If there's a Windows driver for ROCm how come PyTorch still only show ROCm available for Linux?
Anyway good to know they works, I'd like to buy a new system dedicated to LLM/Diffusion tasks and your is the first confirmation it actually works as intended 😅
wait 30bilion parameter model are fine with GGUF and 16gb even with 12 is something that im missing ?
Incredible video!
gpu not detected on rx 6800 windows 10. edit: nvm must load model first from the top center.
Good news! ;D
What do you mean by “first from the top center?” I couldn’t get ROCm to recognize my CLU either, but that was through WSL 2 not this app
As a total dummy all things LLM your video was the catalyst I needed to entertain the idea of learning about all this AI stuff. I'm wondering and this would be a greatly appreciated video if you make it, is it possible to put this gpu to my streaming pc and it encodes and uploads stream and at the same time runs a local LLM that interacts with the chat on twitch. How can I integrate these models with my twitch streams?
I've been looking to make a dedicated AI machine with an LLM. i have a shelf bound 6800xt that has heat issues sustaining gaming, (have repasted, i think is partially defective) i didnt want to throw it away, Now i know i can repurpose it.
AM I required to install AMD HIP SDK for Windows first before I can use LLM studio?
Yes.
Great video. Worked for me on the first try. Is there a guide somewhere on how to limit/configure a model?
Seeing as how I spent last night trying to install ROCm without any luck, nor could I find any good tutorials or a single success story, I'll be curious to see how insanely easy this is. Wait, I don't need to install and run ROCm in WSL?
Hey, I've had success with ROCm on 5.7/6.0/6.1.1 on Ubuntu and 5.7 on Windows so let me know if you're still having an issue and I can probably point you in the right direction
"If you've used an AMD GPU for compute work, you'll know that's not great"
Bruh that Pugetbench score shows the RX 7900 XTX getting 92.6% of the RTX 4090's performance and it has the same amount of VRAM for at least £700 less. 💀💀
I'll try it in a few hours with the 780M iGPU and let you know
Not working!
you couldn't load 30B parameter one because in your settings your trying to offload all layers to your GPU. Play with the setting and try reducing the GPU offload to find your sweet spot.
will be good if also create a video for open-webui + AMD
Can you do an update when ROCm 6.1 is integrated to LM Studio?
6.1 is not likely to ever be available on Windows. Need to wait for 6.2 at least.
@@bankmanager Ok, thanks for the reply.
what about multiple 7600 cards
please someone tell me how to make this 7600xt work normally with stable diffusion
I would like to see how it performs with a graphics card, rx 7600 standard version.
How it work with laptops? We have 2 GPU, small and large and llama studio turn on small gpu(
Nice Miniled !
07:34 Not sure if this will fix it but try unchecking the "GPU offload" box before loading the model, do tell us if it works!
Hi, does it work on RX5500 series ?
Alas... since this uses ROCm, and AMD does not list *any* RDNA1 cards, then the answer is almost certainly... no. You really wouldn't even want to try it though, since the RX5500 XT is a severely gimped card (not to mention the horror of the non-xt OEM-variant) - it has only 1408 shader-cores, compared to the next jump up: the RX 5600 XT's 2304 cores - that's almost a 50% cut in compute! And it has a measly 4 GB's of VRAM... that's complete murder for LLM-usage - everything will be slow as molasses. You'll lose more time and money in trying to run the model (even if it was supported), than if you just got an RX6600 - that card is the best value *still* on this market, so if you want a cheap entry-level card to try this out, I would recommend that.
Can we use it to generate images as well (like mid journey or dall-e) or does it work only for text?
yeah, on linux with SD
RX 7600 XT or RX 6750 XT for LLM ? On Windows.
Is there rx-580 support, who knows for sure? (it's not on the list of ROCm that's why I'm asking) or at list does it work with RX6600M 'cause I see in compatible list only RX6600XT.
The RX6600M is the same chip as the RX6600 (Navi23), just with a different vbios - and since Navi23XT (RX6600XT-6650XT) is simply the full die, without cutting, then it should work on the RX6600M - same chip, just a bit cut down.
(not a bad bin though - it's a good bin, with a higher base-clock than desktop RX6600, even, but shaders cut on purpose, to improve efficiency. I.e, desktop RX6600's are failed bins of RX6600XT's, whom are then cut down to justify their existence - laptop RX6600M, are some of the best 6600XT's but cut on purpose to save power)
How is it doing with image generation?
Can you try Ollama with the this rocm thing? I've been splitting my head trying to get it to work with 6800xt
Ollama doesn’t work with ROCm. It is for nvidia and Apple silicon only.
Are any of these models that we can run locally uncensored/unrestricted?
can you teach how to do LIMs, Large Image Models ?
I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Can you do a comparison vs cuda?
ZLUDA is available again btw
I'd like you to try out an 8700G with fast ram to run LLMs. Also please run Linux.
Can I do anything useful on the phoenix NPU? Just bought a Phoenix laptop.
Would this work with Ollama?
How do I install ROCm software ? I’m at the website but when I download it, all it does it delete my adrenaline drivers…. Do I need the pro software to run ROCm? I still wanna game on my pc too
No, you don't need the pro drivers.
@@bankmanager how can I install ROCm ?
mine isnt using the GPU, it still uses the cpu. 6950xt
I asked If it can generate a qr cod for me and it faild.
Amazing. The 7600(xt) is not even officially supported in AMDs ROCM software.
Chat GPT 3.5 has about 170B parameters and I heard that Chat GPT 4 is a MoE with 8 times 120B parameters, so effectively 960B parameters that you would have to load into vram.
Shouldn't you be at the Olympics? Maybe you are! 😅
So I asked the ai what it recommends if I want to upgrade my pc and it recommended RX 8000 XT💀
Let me know when amd can run diffusi0n models quicker than CPUs 😢
I can't set my 7900xtx to roc. Only options are Vulkan
Hintz Summit
you should mention that ROCm only supports... three... AMD gpu's
More than 3
@@user-hq9fp8sm8f source
@@user-hq9fp8sm8f does it support RX 5600 XT?
@@arg0x- no
@dead_protagonist .. you should mention you don't know what you are talking about and/or didn't read the compatibility supported/unsupported gpu list ...
or ... maybe you just can't count ¯\_(ツ)_/¯
calling a 350-370€ grafics card "budget" is kinda weird ngl
How on earth can these cards be cheaper then NVIDIA I think I'll never buy NVIDIA again ...
Clickbait title I suppose? Just because you can run local LLMs, doesn't mean the GPU plays in the same league as nvidia consumer GPUs (4090)
ROCm still very sux today.
Zluda :)
NEVER BUY AMD
How do you get the AMD out of your throat? Just wondering since I’ve never seen anyone gobble so hard…
sad that there is only advertising here, an amd gpu is bad - where is the video about the problems of an amd gpu ?
i have had AMD GPUs for the past 14 years . never a problem , im on the 7900xtx now and i works great for what i do
Amd is improving software in lighting speed. So what are you smoking ? Why Amd gpu can not do GPGPU with good software ?
Not everyone can afford a 4090 GPU. AMD seems like a better value, at the cost of a little extra effort.
Got anything as good for image generation?