@@wayland7150 I think @antonkovalenko is referencing the way you can flash the vbios of a wx9100 on a mi25 and use it for GPU task. I think the only way right now to do it after you flash is to use gpu-p(Hyper-v). Look for vega 64 GPU virtualization
@@2megaweeman Yeah, unfortunately the MI25 does not make sense for the homelab at the current price. Really wanting SR-IOV, it would make these cards worth a lot more than VEGA if someone smart could show us how to do that.
That Shining meme was pure gold. So, ROCm can help port CUDA stuff on OpenMP or whatever the open standard is called, in the data center side. I hope that it is also easier for desktop CUDA code to be ported, so that, for example, ANSYS can support AMD GPUS more easily.
RocM (formerly HSA) has had tools to port CUDA workloads for years, but th presence and convenience of CUDA has been too strong for people to care. All it takes is an Open Source project and a company willing to change from the norm for whatever reason.
Do you happen to have any tutorials on running such models with customer GPUS. I have a 6800XT and would love to work on it. The far I got was using the default Docker container with Tensorflow, not sure if I'm on the right track? Thanks for any input.
It's good to hear rocm has got easier to install. Back when I was using a Vega 56 I tried installing it. It was a nightmare. I gave up and just used a docker image.
You should just try out stable-diffusion making 4k images instead of 1024x1024. The processing requirements scale quadratically as does pixel density with larger text to image generation so it's not feasible on normal human system, but the algorithm and walkthroughs are so organized anyone should be able to download the weights, set it up and get it running. You'd be the first with 4k diffusion, and you could even trying training it up to get better at faces and hands using that u2 sized sweet, sweet top rack candy 😍
I work in the field of photogrammetry, a subset of computer vision, and I'm praying to whatever deity is willing to listen to make CUDA obsolete, but everything is moving so, so slow. Quite a while back I came across SYCL and I was mightily impressed, but it was in super early stages and I haven't checked back recently. Nvidia has had a horrible stranglehold on the whole computer vision industry for quite a while, but there might be some cracks showing given their recent open-sourcing of CV-CUDA libraries, which, you don't need me to point out, is an incredibly un-Nvidia move to pull - following their earlier and also un-Nvidia move of sort-of open-sourcing their driver for Linux.
Nvidia started also updating their support for OpenCL. You are now not stuck forever on version 1.2 if you have a Nvidia GPU but can now use 3.0! Maybe you should have a look into OpenCL. It's pretty much CUDA but as an OpenStandard with support from all major vendors (for both GPU+CPU). It just needs publicity...
Very good video... I think I know why everyone is rushing to support AMD... About 3 months or so ago I was watching a tech video about self driving and the gist was that full self driving will require around 2PFbf16 and if AMD hits their target with MI300 it will have around 2.5PF(QOPS?) as MI250X has 383TOPS with MI300 aiming for 8X the AI perf (from AMDs presentation)... That's exciting AF...
Perhaps not the only reason, but the DOE's Energy Exascale Earth System Model (E3SM, the DOE climate model), requires big-time FP64 flops. AMD is, and has been for a while, WAY ahead of NVIDIA when it comes to FP64. Btw, running E3SM might be a good test. As far as I know, DOE has developed containerized versions of E3SM, and you should be able to download and run it (or a small chunk of it) on that machine.
I'll add that traditionally climate and weather models have been written in Fortran. DOE has sunk a lot of effort into getting code refactored into C++ to be able to use GPUs. NASA instead has just stuck with CPUs in their machines. Big question where the field as a whole goes from here.
modern fortran is still used even today for scientific computing. if you're a scientist who doesn't have time to deal with the quirks of c-languages, then fortran is really the best choice for you.
I like these little looks into Wendell's server room. It's basically my dream home setup. I've no clue what I'd do with it all, probably waste time putting pihole on kubernetes or something, but still. I'm actually really excited about new, improved ROCm. I've got torch running on a 6900XT so I can sort of do CUDA through ROCm already, but it's still missing an awful lot of features and performance compared to the Nvidia version, 99% of the time I'm better off just using an Nvidia card, even though my best Nvidia stuff is two generations behind RDNA2. I think consumer-accessible and actually fun machine learning things like Stable Diffusion is a great thing for this field, the more people who get into CUDA and ROCm, the more emphasis will be placed on accessible hardware with >8GB of GDDR and decent compute capabilities that are easy enough to use that even I could set it up. Unfortunately the reality is that, despite the advances they've made, AMD aren't really a competitor yet. Nvidia still has an enormous headstart, and breaking the "vendor lock-in" that CUDA so effectively creates is only the first step. AMD need to actually deliver competitive performance. They're in a good position to do that, chiplets are the future and Nvidia's monolithic dies are getting truly ridiculous (>600mm²!); AMDs yields are going to be far higher, which means they should be able to afford to stuff more cores into their products. That they aren't is somewhat baffling to me.
Hi Wendell. Do you intend to benchmark rocm for pytorch? I'm very interested in this and it seems like it doesn't really exist on the web. As others have said, Cuda dependence is scary!
Maybe in the future we can see what us poor people can still do with an MI25. I struggled for a little bit to get ROCM installed (apparently Vega support ended after ROCm 5.0 I think it was, specific versions of Linux too apparently), then I gave up and flashed it's vBIOS to a WX9100... after bashing my head off my keyboard to figure out the right buttons to press to get the flash to work... and realizing there were 2 BIOS chips that needed flashed.
I've seen those for less than $100 on eBay. I would really love to get one or two of those working for a VDI project that I'm working on. I really hate GRID.
@@ewilliams28 Paid $80 plus tax for mine. I'd love a good excuse to use it for more than just gaming, but that was my main goal; so not a big concern, just nerd desires.
ROCm is great because you can have the same machine learning setup on your workstation as on the supercomputer. This will succeed for the same reason that x86 succeeded and the same reason that Linux succeeded - accessibility by the masses. I believe the popular term these days is Democratisation.
So what's OneAPI and HIP? Now we need have 5 API's for example to run raytracing on GPU in Blender (nvidia optix and cuda, AMD HIP, Intel oneapi an mac metal). How will a small team or individual working on a piece of software that need GPU acceleration get that to work (decently optimized) with all mainstream platforms?
They could usw OpenCL. An already existing API with Support from all major vendors for CPU and GPU computation (and everything else that implements it. e.g. FPGAs). It also supports all major OS (Windows Linux Mac and even Android just to name a few). I just don't get it why we need another standard that does the exact same thing.
@@Pheatrix Yeah, but it's bad buggy and you could never close to the performance of Cuda. That is why it is abandoned. So seriously no dev is gonna use opencl for high performance gpgpu. Apple too completely removed support for it in favor of their own way better performing metal api.
@@ramanmono Boinc, pretty much every cryptominer and a Lot of other programms use OpenCL. The performance gap between cuda and OpenCL ist there because Nvidia decided to only support up to OpenCL 1.2 however there are a lot of features that require at least 2.0 Recently Nvidia bumped the supported version up to 3.0 so the performance gap should no longer be there. And the bugs: well every vendor hast to implement their own driver and compiler. AMD is known for buggy drivers and as I already said Nvidia pretty much abandoned OpenCL in favor for their proprietary solution. All of these problems are solvable. And with way less work than creating a completely new solution that solves the exact same solution
Would it be possible for AMD to create it's own titan by having an RDNA die and an CDNA die in a SoC? Would they be able to use Async compute to feed the CDNA die and boost Raytracing calculations?
Hi Wendell, I think you mistook Fortran for Cobol here. Fortran is used in science applications that get sent to HPC clusters, not really useful for finance.
He definitely means Fortran here. Fortran, C, and C++ are the best supported languages for GPU programming. Those languages also have the OpenMP support he mentioned.
@@OGBhyve I know but his explanation that Fortran exists because of legacy finance applications is a Cobol backstory. I am a fellow HPC guy, I know Fortran very well.
One bug I found with ROCm is that it just doesn't work at all if you mix a Radeon Pro Duo Polaris with an RX Vega 64. It just doesn't detect anything if you mix cards. Pretty frustrating.
I mean is mixing cards any sort of norm? Not making excuses (it not working sucks), merely pointing out that may not exactly be a priority usecase for fixes.
@@TheKazragore I agree. I'd imagine with it being a relatively niche scenario, nobody would've tested it or even considered it. I just compiled ROCM again yesterday and my issue seems to have been fixed now, so happy days :)
Great video. I would just like more broader (radeon cards) support. I eas playing with rocm since its release on rx480, but totaly lost interes with lack of rdna(1) support and even rx480 lost its official support. And all the details with pcie-atomics and almost none laptop dgpu and apu support. But again nice that they at least enterprice support.
Hi, is it possible to do a basics video about ROCm ? Sorry to bother you and thanks. Also what are the differences in uses between EPYC , Threadripper CPUs and the many different GPUs like AMD Instinct ones Vs Nvidia A6000?
OK. I hope to see also some more modern Radeon Instincts here. Unless the MI210 is one. IDK if AMD changes their names for those cards honestly, but I did hear about MI250 and MI300 - latter of which probably isn't out yet. I hope someone will educate me on this, because honestly quick google search has a lot of problems with sources that IDK if I should trust.
Could you tell the wire monkey to wear hearing protection, so that they don't get hearing damage? You got me to laughing w/ tears about the #shining and AMD's relentless execution!
@@intoeleven Yes. They have a fork of TensorFlow. Which is why I've asked why they haven't upstreamed it. If it isn't mainline, it doesn't really matter that much.
@@Level1Techs Is there any way to get notifications as soon as the video drops? discord notifications work for me for some channels , is there something similar on the forums for us free tier folks other than RUclips .
I supposed in the future we will look at Intel, their accelerator hardware (GPU Max?) And software stack (oneAPI) which includes all kinds of solutions. None of which seem finished tho.
There already is an open standard for this: OpenCL ! It runs on pretty much everything (including CPUs, FPGAS, and GPUs) and with OpenCL 3 you also get a newer version than 1.2 on Nvidia Devices. Why do we need a new standard if we can just use the one that already exists and has support from every major vendor?
As for why Oak Ridge chose AMD for Frontier -- my guess is that Nvidia has massively optimised their silicon for AI workloads, where AMD has targeted more general GPGPU compute workloads. For a general purpose HPC system, FP64 is critical. Looking at the relative FP64 performance (especially FP64/W) shows how wide the gap is. Why Facebook/Meta are looking to switch? Given I'd imagine most of their workload is AI/ML, that's a much tougher puzzle.
Anything can. I run Debian Bookworm with everything from PerceptiLabs, Anaconda with all the juices even with SecureBoot and Nvidia and their CUDA Toolkit. Once up and running, no upgrade hell as with Ubuntu.
No mention of consumer AMD GPUs? It kinda feels like AMD doesn't care about ml. Researchers use CUDA because it's officially supported on their desktops.
In my humble opinion AMD should focus on unified memory architecture like Apple M series CPU's. You can not offload a lot of computations to GPU because the memory transfer requirement simply kills your gains. An unified architecture will make every operation as a target for acceleration and Nvidia has no answer for this since they only make GPU's. AMD CPU's with built in GPU's can break benchmarks for both Intel and Nvidia. Correction : I'm such a fool. HBM unified memory will come to AMD in 2023 for datacenters with MI300 in 2023. They announced it in Financial Analyst Day 2022. I can't believe I missed it.
You realize AMD was the one who created the HSA foundation right? HSA was demonstrated before Zen 1 existed. When AMD moves on this, no one will be doing it better.
@@jesh879 Yeah I know but HSA only includes cache coherency (that's what I understand from v1.2 of standard) but Apples implementation goes beyond what AMD or Intel called UMA. In M1 CPU and GPU share the same ram and can use it utilize when needed.
Cloud managed IOT can go straight to hell. They should ship an app that runs on your phone and provides an API that the IOT gear detects, and let you pair with bluetooth or with a button and extreme close range (easy to detect with the WiFi or BT hardware.) After that you should be able to manage it from the same app running on your PC, and you should be able to install a PKI signature onto the IOT device which forever locks it to a cert under your control, so it can't be hijacked, not even by your child/spouse/roommate/landlord etc.
Wendell is gonna suddenly disappear and we won't hear from him for 6 months and it'll turn out that while making his video about using the Supermicro on the stock-martket with-in 5 minutes of turning it on he managed to become 3 richest man on the planet and spent the last 6 months on HIS private private island LOL :D
Intel CPUs are working excellent with PyTorch and it should be easy to join new GPUs considering oneapi, amd not so much lets hope it changes in near future and amd software will get better performance and some stability. I don't know anyone who use AMD over NVIDIA in Machine/Deep learning right now cause of ROCm extremely poor quality and problems with consumer gpu not working with ROCM at all so you can't develop locally, but there are few folks works with scientific computation focus mostly in HPC that use AMD for Float64 calculations.
I use ROCm over Cuda sometimes. I've benchmarked a fair amount of tensorflow code for my research and it is neck and neck with last gen hardware (Radeon VII's vs A/P100's) . It is very easy to get it running particularly if you use the ROCm docker images for your tool of choice. And the tensorflow/jax code just runs with no modifications.
@@RobHickswm Cool but Tensorflow isn't PyTorch I tested 3090 with Radeons in close price points and they are always few times slower maybe in the extremely high end datacenter they are close enough but I haven't any AMD card to test it.
@@dawwdd I've only tested the Radeon VII (which uses HBM2 memory like the datacenter cards) and for things I'm doing (not canned ML benchmarks) it is as fast/faster than the Nvidias with a few exceptions here and there depending on the op. You're right. Not pytorch, just jax and tensorflow.
@@HellsPerfectSpawn Yeah, totally of their own accord, not because their monopoly was so severe that that literally had to after years of ILLEGALLY doing MSFT's bidding.
@@evrythingis1 Again what kind of mental hoops are you jumping through. Are you trying to suggest that because Intel got sued in Europe it suddenly found a reason to go open source??
and now it’s may 2023 and nobody cares about ML on amd cards. unless it’s a drop-in replacement, nobody will migrate their massive ML tech stacks to eh, what do you call it.. radeon?
@@Blacklands a forum is way better than having your own child and seeing your legacy live on along with everything that you can pass on to him besides tech stuff. You're so wise, maybe he can put that on his tombstone "I have a forum"
I am really hoping AMD can make not using CUDA a reality
It's all up to the devs really.
I've given up on AMD gpus ever competing in compute.
Hopefully Intel's OneAPI works out.
@@zacker150 It's so weird to comment this on the same video in which you heard them currently competing in several of the top super computers
@@RaaynML The AMD environment lacks tooling. Though a new tool, MIPerf, is coming and should play a similar role to the nsight compute nvidia provides
They are trying hard, very hard, however the curse of the Ctrl-c Ctrl-v run too strong in the programming community.
my god Wendel you really made my day with that Shinning meme 🤣!!! thank you
Being able to write code once and being able to run on either platform is so huge.
Yes, Java promised to do this a gabillion years ago. Sadly I don't see any new tool getting any closer.
It would be cool to see just some torch benchmarks of some regular ML models vs 3090 and other Nvidia cards.
love this channel learned allot over the year, thanks Wendell!
I really hope everyone starts pronouncing it Rock'em, like Rock'em Sock'em Robots. It's much more funner that way.
Same here, I thought that was just how it was pronounced
People pronouncing it another way hadn't occured to me.
Small m implies that it should be pronounced this way!
OMG WENDELL!!!! @ 3:00 Is that Betty White as a ZOMBIE on your desk?!?!?!?! THATS AWESOME!!!! lmao!!!
I regard yourself and Steve Burke as the two best voices in the computer hardware space. Your channel is treasure trove of information
12:47 - MI25 also supports SR-IOV, but there's no public documentation on how to actually utilize it
Tell us more please.
@@wayland7150 I think @antonkovalenko is referencing the way you can flash the vbios of a wx9100 on a mi25 and use it for GPU task. I think the only way right now to do it after you flash is to use gpu-p(Hyper-v). Look for vega 64 GPU virtualization
@@2megaweeman Yeah, unfortunately the MI25 does not make sense for the homelab at the current price. Really wanting SR-IOV, it would make these cards worth a lot more than VEGA if someone smart could show us how to do that.
That Shining meme was pure gold.
So, ROCm can help port CUDA stuff on OpenMP or whatever the open standard is called, in the data center side. I hope that it is also easier for desktop CUDA code to be ported, so that, for example, ANSYS can support AMD GPUS more easily.
👍🏻
RocM (formerly HSA) has had tools to port CUDA workloads for years, but th presence and convenience of CUDA has been too strong for people to care. All it takes is an Open Source project and a company willing to change from the norm for whatever reason.
Wendell flips heavy server gear with ease and grace, meanwhile, ....... Linus drops everything.
The virgin tech enthusiast vs. the chad IT professional.
Been using ROCm on a 6700xt for stable diffusion and I'm shocked how well it performs considering it's not even a CDNA GPU.
It's really cool tech to tinker with. I'm as well using 6700XT in SD. It's so nice to have 12 Gb vram.
Could you recommend any tutorial how to make it work?
Most recent AMD consumer GPUs had support for it
Do you happen to have any tutorials on running such models with customer GPUS. I have a 6800XT and would love to work on it. The far I got was using the default Docker container with Tensorflow, not sure if I'm on the right track? Thanks for any input.
going too be so fun watching you do vids on these! The enterprise side is so interesting atm!
It's good to hear rocm has got easier to install. Back when I was using a Vega 56 I tried installing it. It was a nightmare. I gave up and just used a docker image.
Would love to see stable diffusion performance on this machine. How large an image can you generate with the pooled gpu memory?
You should just try out stable-diffusion making 4k images instead of 1024x1024. The processing requirements scale quadratically as does pixel density with larger text to image generation so it's not feasible on normal human system, but the algorithm and walkthroughs are so organized anyone should be able to download the weights, set it up and get it running. You'd be the first with 4k diffusion, and you could even trying training it up to get better at faces and hands using that u2 sized sweet, sweet top rack candy 😍
N00b question. Can one test RocM on consumer RDNA2?
Asking the real questions. As far as I know No but I am sure someone will figure it out.
I work in the field of photogrammetry, a subset of computer vision, and I'm praying to whatever deity is willing to listen to make CUDA obsolete, but everything is moving so, so slow. Quite a while back I came across SYCL and I was mightily impressed, but it was in super early stages and I haven't checked back recently.
Nvidia has had a horrible stranglehold on the whole computer vision industry for quite a while, but there might be some cracks showing given their recent open-sourcing of CV-CUDA libraries, which, you don't need me to point out, is an incredibly un-Nvidia move to pull - following their earlier and also un-Nvidia move of sort-of open-sourcing their driver for Linux.
Nvidia started also updating their support for OpenCL. You are now not stuck forever on version 1.2 if you have a Nvidia GPU but can now use 3.0!
Maybe you should have a look into OpenCL. It's pretty much CUDA but as an OpenStandard with support from all major vendors (for both GPU+CPU).
It just needs publicity...
Very good video... I think I know why everyone is rushing to support AMD... About 3 months or so ago I was watching a tech video about self driving and the gist was that full self driving will require around 2PFbf16 and if AMD hits their target with MI300 it will have around 2.5PF(QOPS?) as MI250X has 383TOPS with MI300 aiming for 8X the AI perf (from AMDs presentation)...
That's exciting AF...
16:47 🤣 relentless execution
Perhaps not the only reason, but the DOE's Energy Exascale Earth System Model (E3SM, the DOE climate model), requires big-time FP64 flops. AMD is, and has been for a while, WAY ahead of NVIDIA when it comes to FP64. Btw, running E3SM might be a good test. As far as I know, DOE has developed containerized versions of E3SM, and you should be able to download and run it (or a small chunk of it) on that machine.
I'll add that traditionally climate and weather models have been written in Fortran. DOE has sunk a lot of effort into getting code refactored into C++ to be able to use GPUs. NASA instead has just stuck with CPUs in their machines. Big question where the field as a whole goes from here.
modern fortran is still used even today for scientific computing. if you're a scientist who doesn't have time to deal with the quirks of c-languages, then fortran is really the best choice for you.
I'd LOVE this for BOINC!!! "Drool"
I like these little looks into Wendell's server room. It's basically my dream home setup. I've no clue what I'd do with it all, probably waste time putting pihole on kubernetes or something, but still.
I'm actually really excited about new, improved ROCm. I've got torch running on a 6900XT so I can sort of do CUDA through ROCm already, but it's still missing an awful lot of features and performance compared to the Nvidia version, 99% of the time I'm better off just using an Nvidia card, even though my best Nvidia stuff is two generations behind RDNA2. I think consumer-accessible and actually fun machine learning things like Stable Diffusion is a great thing for this field, the more people who get into CUDA and ROCm, the more emphasis will be placed on accessible hardware with >8GB of GDDR and decent compute capabilities that are easy enough to use that even I could set it up.
Unfortunately the reality is that, despite the advances they've made, AMD aren't really a competitor yet. Nvidia still has an enormous headstart, and breaking the "vendor lock-in" that CUDA so effectively creates is only the first step. AMD need to actually deliver competitive performance. They're in a good position to do that, chiplets are the future and Nvidia's monolithic dies are getting truly ridiculous (>600mm²!); AMDs yields are going to be far higher, which means they should be able to afford to stuff more cores into their products. That they aren't is somewhat baffling to me.
Hi Wendell. Do you intend to benchmark rocm for pytorch? I'm very interested in this and it seems like it doesn't really exist on the web. As others have said, Cuda dependence is scary!
Maybe in the future we can see what us poor people can still do with an MI25. I struggled for a little bit to get ROCM installed (apparently Vega support ended after ROCm 5.0 I think it was, specific versions of Linux too apparently), then I gave up and flashed it's vBIOS to a WX9100... after bashing my head off my keyboard to figure out the right buttons to press to get the flash to work... and realizing there were 2 BIOS chips that needed flashed.
I've seen those for less than $100 on eBay. I would really love to get one or two of those working for a VDI project that I'm working on. I really hate GRID.
@@ewilliams28 Paid $80 plus tax for mine. I'd love a good excuse to use it for more than just gaming, but that was my main goal; so not a big concern, just nerd desires.
Important stuff, thank you!
The title is as appealing as the scientific names of most plants.
ROCm is great because you can have the same machine learning setup on your workstation as on the supercomputer. This will succeed for the same reason that x86 succeeded and the same reason that Linux succeeded - accessibility by the masses. I believe the popular term these days is Democratisation.
16:41 This was amazing!!!😂
I would love to be able to use Instinct cards and be able to get rid of GRID as well.
So what's OneAPI and HIP? Now we need have 5 API's for example to run raytracing on GPU in Blender (nvidia optix and cuda, AMD HIP, Intel oneapi an mac metal). How will a small team or individual working on a piece of software that need GPU acceleration get that to work (decently optimized) with all mainstream platforms?
They could usw OpenCL.
An already existing API with Support from all major vendors for CPU and GPU computation (and everything else that implements it. e.g. FPGAs). It also supports all major OS (Windows Linux Mac and even Android just to name a few).
I just don't get it why we need another standard that does the exact same thing.
@@Pheatrix Yeah, but it's bad buggy and you could never close to the performance of Cuda. That is why it is abandoned. So seriously no dev is gonna use opencl for high performance gpgpu. Apple too completely removed support for it in favor of their own way better performing metal api.
@@ramanmono
Boinc, pretty much every cryptominer and a Lot of other programms use OpenCL.
The performance gap between cuda and OpenCL ist there because Nvidia decided to only support up to OpenCL 1.2 however there are a lot of features that require at least 2.0
Recently Nvidia bumped the supported version up to 3.0 so the performance gap should no longer be there.
And the bugs: well every vendor hast to implement their own driver and compiler. AMD is known for buggy drivers and as I already said Nvidia pretty much abandoned OpenCL in favor for their proprietary solution.
All of these problems are solvable. And with way less work than creating a completely new solution that solves the exact same solution
Great video!
Would it be possible for AMD to create it's own titan by having an RDNA die and an CDNA die in a SoC?
Would they be able to use Async compute to feed the CDNA die and boost Raytracing calculations?
Hi Wendell, I think you mistook Fortran for Cobol here. Fortran is used in science applications that get sent to HPC clusters, not really useful for finance.
He definitely means Fortran here. Fortran, C, and C++ are the best supported languages for GPU programming. Those languages also have the OpenMP support he mentioned.
@@OGBhyve I know but his explanation that Fortran exists because of legacy finance applications is a Cobol backstory. I am a fellow HPC guy, I know Fortran very well.
@@DarkReaper10 It's used in Finance too, but I see your point that it is more popular in scientific applications.
I’ll set this up on my desktop tonight, been watching rocm for a while. Maybe I can finally retire the M40
0:48 AAAAHHHHHHHHHH!!!!!!!! him no want things plugged in his body
One bug I found with ROCm is that it just doesn't work at all if you mix a Radeon Pro Duo Polaris with an RX Vega 64. It just doesn't detect anything if you mix cards. Pretty frustrating.
I mean is mixing cards any sort of norm? Not making excuses (it not working sucks), merely pointing out that may not exactly be a priority usecase for fixes.
@@TheKazragore I agree. I'd imagine with it being a relatively niche scenario, nobody would've tested it or even considered it.
I just compiled ROCM again yesterday and my issue seems to have been fixed now, so happy days :)
Patrick boyle runs a finance channel and might be willing to work with you on actually using openBB
4:20 - A little server room ASMR for us all.
Great video. I would just like more broader (radeon cards) support. I eas playing with rocm since its release on rx480, but totaly lost interes with lack of rdna(1) support and even rx480 lost its official support. And all the details with pcie-atomics and almost none laptop dgpu and apu support.
But again nice that they at least enterprice support.
I'd love to see some follow up on this one.
Level 1? more like, level 2000. I didnt understand a word until I heard Fortran.. maybe because Im a COBOL programmer :)
Let's get that onto a 1ru tray.. nice.
Get this man some Mi 250x
@Level1Techs does Nivida or Intel have a direct competitor against the instinct accelerator?
Hi, is it possible to do a basics video about ROCm ? Sorry to bother you and thanks.
Also what are the differences in uses between EPYC , Threadripper CPUs and the many different GPUs like AMD Instinct ones Vs Nvidia A6000?
Why are they only supporting OpenCL on Instinct?
Why don't we have Vulkan or a new VulkanCompute version available?
I heard OpenCL is stuck
OK. I hope to see also some more modern Radeon Instincts here. Unless the MI210 is one. IDK if AMD changes their names for those cards honestly, but I did hear about MI250 and MI300 - latter of which probably isn't out yet. I hope someone will educate me on this, because honestly quick google search has a lot of problems with sources that IDK if I should trust.
it's the same architecture as mi250, they are both CDNA2, lunched in March this year,
MI250 was launched 11/2021, MI210 03/2022, MI3xx is expected for 2023.
The day CUDA is not the only option will be a good day
i guess Wendel has electricity bills
Every time I see Wendel go into the Server room. All I can think of is, where is your hearing protection, Wendell?
Do a dnetc run on it?
Could you tell the wire monkey to wear hearing protection, so that they don't get hearing damage? You got me to laughing w/ tears about the #shining and AMD's relentless execution!
Why hasn't AMD upstreamed their TensorFlow support?
ROCm has supported tensorflow repo at their GitHub
@@intoeleven Yes. They have a fork of TensorFlow. Which is why I've asked why they haven't upstreamed it. If it isn't mainline, it doesn't really matter that much.
@@garrettkajmowicz They are upstreaming and syncing it constantly. Their own fork is for customers.
Did the title change? Or am i high.
Title changed. Views are low and we're hoping the title change will fix it ~Editor Autumn
@@Level1Techs Great video, as always.
Thanks!
Let the clicks and engagement rise up.
@@Level1Techs Is there any way to get notifications as soon as the video drops?
discord notifications work for me for some channels , is there something similar on the forums for us free tier folks other than RUclips .
250v 20 Amps, at some point that could cook food or boil big amounts of water, that's super serial seriasly serial
What does it cost
But can it run Cuda?
I supposed in the future we will look at Intel, their accelerator hardware (GPU Max?) And software stack (oneAPI) which includes all kinds of solutions. None of which seem finished tho.
There already is an open standard for this:
OpenCL !
It runs on pretty much everything (including CPUs, FPGAS, and GPUs) and with OpenCL 3 you also get a newer version than 1.2 on Nvidia Devices.
Why do we need a new standard if we can just use the one that already exists and has support from every major vendor?
As for why Oak Ridge chose AMD for Frontier -- my guess is that Nvidia has massively optimised their silicon for AI workloads, where AMD has targeted more general GPGPU compute workloads. For a general purpose HPC system, FP64 is critical. Looking at the relative FP64 performance (especially FP64/W) shows how wide the gap is. Why Facebook/Meta are looking to switch? Given I'd imagine most of their workload is AI/ML, that's a much tougher puzzle.
I don't see Meta swapping vendors but I can see them bringing up their cool new software every time they need to buy a batch of Tesla cards.
but can it run crysis
Can rocm be setup in Debian rather than Ubuntu?
Anything can. I run Debian Bookworm with everything from PerceptiLabs, Anaconda with all the juices even with SecureBoot and Nvidia and their CUDA Toolkit. Once up and running, no upgrade hell as with Ubuntu.
No mention of consumer AMD GPUs? It kinda feels like AMD doesn't care about ml. Researchers use CUDA because it's officially supported on their desktops.
@@Cooe. Meh. Many off the shelf models require CUDA for at least one layer. Still makes no sense to use AMD for machine learning
but, can 1 MI250 run 64 instances of CRYSIS 64-bit
IM SUPER SERIAL GUYS! CONSOLE CABLES ARE REAL!!1!
IM SERIAL! :D
17:02 Doom 2016 LOL
In my humble opinion AMD should focus on unified memory architecture like Apple M series CPU's. You can not offload a lot of computations to GPU because the memory transfer requirement simply kills your gains. An unified architecture will make every operation as a target for acceleration and Nvidia has no answer for this since they only make GPU's. AMD CPU's with built in GPU's can break benchmarks for both Intel and Nvidia.
Correction :
I'm such a fool. HBM unified memory will come to AMD in 2023 for datacenters with MI300 in 2023. They announced it in Financial Analyst Day 2022. I can't believe I missed it.
xylinx might be the able to help with accelerators but it's a few years off before we see any applications in the consumer realm.
You realize AMD was the one who created the HSA foundation right? HSA was demonstrated before Zen 1 existed. When AMD moves on this, no one will be doing it better.
@@jesh879 Yeah I know but HSA only includes cache coherency (that's what I understand from v1.2 of standard) but Apples implementation goes beyond what AMD or Intel called UMA. In M1 CPU and GPU share the same ram and can use it utilize when needed.
Blender would be Fun.
MM laser...it is fine. :D xD
Cloud managed IOT can go straight to hell. They should ship an app that runs on your phone and provides an API that the IOT gear detects, and let you pair with bluetooth or with a button and extreme close range (easy to detect with the WiFi or BT hardware.) After that you should be able to manage it from the same app running on your PC, and you should be able to install a PKI signature onto the IOT device which forever locks it to a cert under your control, so it can't be hijacked, not even by your child/spouse/roommate/landlord etc.
iot is dumb, internet protocol is overkill for a lightbulb, matter over thread is the future. zwave/zigbee for right now
i don't know shit about HPC, but it seems everyone likes opensource
it kind of felt i bought stuff from the right company
Just compare it with supercomputers from about 20 years ago
Wendell is gonna suddenly disappear and we won't hear from him for 6 months and it'll turn out that while making his video about using the Supermicro on the stock-martket with-in 5 minutes of turning it on he managed to become 3 richest man on the planet and spent the last 6 months on HIS private private island LOL :D
All they have to do is make hip *checks notes* not shit
It’s still a PITA to work with
This title will get clicked by no one who is not a serious enthusiast/nerd.
Intel CPUs are working excellent with PyTorch and it should be easy to join new GPUs considering oneapi, amd not so much lets hope it changes in near future and amd software will get better performance and some stability.
I don't know anyone who use AMD over NVIDIA in Machine/Deep learning right now cause of ROCm extremely poor quality and problems with consumer gpu not working with ROCM at all so you can't develop locally, but there are few folks works with scientific computation focus mostly in HPC that use AMD for Float64 calculations.
I use ROCm over Cuda sometimes. I've benchmarked a fair amount of tensorflow code for my research and it is neck and neck with last gen hardware (Radeon VII's vs A/P100's) . It is very easy to get it running particularly if you use the ROCm docker images for your tool of choice. And the tensorflow/jax code just runs with no modifications.
@@RobHickswm Cool but Tensorflow isn't PyTorch I tested 3090 with Radeons in close price points and they are always few times slower maybe in the extremely high end datacenter they are close enough but I haven't any AMD card to test it.
@@dawwdd I've only tested the Radeon VII (which uses HBM2 memory like the datacenter cards) and for things I'm doing (not canned ML benchmarks) it is as fast/faster than the Nvidias with a few exceptions here and there depending on the op. You're right. Not pytorch, just jax and tensorflow.
now they should make an ai accelerator that doesn't cost a kidney
Sorry, "AI accelerator" contains two $-add words, so $$$ instead of $
Come on, trading? Is that the best usage for this hardware?
rocprof is soon to be hidden under an GUI called MIperf that has yet to be released by AMD but is available on Crusher (a TDS of frontier)
it will provide information similar to what Nsight compute does. Imo tooling was one of the last big problem with working with AMD cards.
Should I get this or the 4090?
Greek
first?
yup
good boy
Maybe Intel and Nvidia will learn that they shouldn't rely on being a monopoly for their success.
What are you blabbering about Intel provides more open source code to Linux then all the other PC players combined
@@HellsPerfectSpawn Yeah, totally of their own accord, not because their monopoly was so severe that that literally had to after years of ILLEGALLY doing MSFT's bidding.
@@evrythingis1 ??? What mental gymnastics are you jumping through mate?
@@HellsPerfectSpawn Do you not know anything at all about Intel's history of Antitrust violations!?
@@evrythingis1 Again what kind of mental hoops are you jumping through. Are you trying to suggest that because Intel got sued in Europe it suddenly found a reason to go open source??
Dude wth is that framed picture on your desk??? Looks satanic…
Second, haha
and now it’s may 2023 and nobody cares about ML on amd cards. unless it’s a drop-in replacement, nobody will migrate their massive ML tech stacks to eh, what do you call it.. radeon?
It's a shame this guy doesn't have any kids. He has so much knowledge crammed inside his head.
The RUclips channel is his baby
We are his spawn
He doesn't need children to leave a legacy. _We_ are his legacy.
Well, he has a forum and a RUclips channel...! He's teaching many more people than just the kids he doesn't have!
@@Blacklands a forum is way better than having your own child and seeing your legacy live on along with everything that you can pass on to him besides tech stuff. You're so wise, maybe he can put that on his tombstone "I have a forum"