amazing and it is everything someone that wants to learn the basics ever needs. I am a true believer that the most important thing is to get a grasp of the intuition and then slowly try to dive deeper into any topic
Anyhow, thank you for your comment! I'll definitley talk about it in due time 😉 ...it's just too soon, it's only the first episode of this parallel computing series
I absolutely agree about the cooling! it's a key component, especially in overclocked systems. a good air circulation will ensure your hardware lasts for much longer!
This would be two reads with no bank conflicts. Threads would all read from consecutive banks to get the first value, then they'd broadcast from bank 0. The two values would be read into registers, operations are not performed in any memory other than registers. The code looks like its requesting two values from shared memory at once but it's not. The PTX would show that the two reads are separate instructions.
Thank you for response!!! i am also a parvenu in robotics))) But i love opencv and PCL looks promising(i'd been working on pcl, but without the MS library) ROS guys use python a lot! So, take a rest and look at the theme please!!!
Very nice introduction. Using additional software like GPU-Z while processing data (e.g: training neural network) you can check your GPU load and temperature. When processing big chunks of data for long time ( days) check that you GPU doesnt pass the maximum temperature joint parameter (in my case it was 100°C). Another thing is that for PCs it is really important to have good coolers and a notable spatial separation between CPU and GPU (GPU support better high temperatures than CPUs).
The software works with CUDA. I think their competitor that makes CST also uses video card memory for ultra fast calculations of insanely large matrices (inverting the matrix) .
As a graphics engineer, I recommend you use Optix if you do not need any rasterization feature. DirectX or Vulkan is way too verbose and complex for personal projects.
awesome video i found it quite useful, there are two minors errors, first when you initialize the max (d_max) to 0 it will return this zero if all the numbers of the array are less than 0, it should be any number of the array, the second is on the kernel sould also initialize temp with a value in the array (temp = array[0]);
Perfect Video! Saw was revealing to me to understand how it works. Thank you! I am a new subscriber of your channel. Regards from Buenos Aires, Argentina
А у нас сегодня 13 - холодно, холодная зима, страна Кипр, я тут уже два года живу. Для программиста ты очень красивая - прямо мечта гика :) Нашёл твой ролик про скрэпинг :) Ну далее уже подписался смотрю :)
Hyper V + Remote FX + Cuda Server = the perfection, u .u
Год назад
But if you have a lot of tasks that depend on eachother finishing in sequence or a complicated code with a lot of branching etc or if you want the hardware to work out what parts of the code that can be run in parallel for you then your code will run faster on a CPU then a GPU.
Very informative and visuals helped me comprehend. I didn't purchase a GPU card last time due to fear of Ubuntu compatability. I'll get a moderate card in upcoming days, I think I got a good power source too but gotta check
It sounds complex, but it actually involves only a few lines of code! (as some very smart folks have already built this model for us and already trained it on the CIFAR dataset... we're just loading it and using it for our own stuff 😉)
11:00 YOU ARE WATCHING A MASTER AT WORK 😍😍😍 !!!
This guy is like the HITMAN of NVIDIA, he just simply murders all its competitors 🥶
I feel like Cuda has been demystified. Very glad I found your series.
amazing and it is everything someone that wants to learn the basics ever needs. I am a true believer that the most important thing is to get a grasp of the intuition and then slowly try to dive deeper into any topic
now that I again watched it... I have no words to say more than FANTASTIC .. clarity, knowledge, and everything else ..
Subscribed ! Excellent work.
This had LTT / LMG levels of production value with one of the best / clearest explanations for what CUDA is and why it matters.
Anyhow, thank you for your comment! I'll definitley talk about it in due time 😉 ...it's just too soon, it's only the first episode of this parallel computing series
By the way, I'm premiering the next episode in this parallel computing series in 45 minutes - come say hi! 😁
I absolutely agree about the cooling! it's a key component, especially in overclocked systems. a good air circulation will ensure your hardware lasts for much longer!
Padding in Magical. Awesome explanation!!!
I use CUDA with HFSS - a numerically intensive electromagnetic solver (solves/satisfies Maxwell’s equations in 3D space).
Super clear explanation Ahmad, great video, thank you!
Unbelievably clear video. Thanks 🙏
Ahmad thank you, my 8yo daughter is very happy with your channel, the graphics help, and representation, it matters.
This would be two reads with no bank conflicts. Threads would all read from consecutive banks to get the first value, then they'd broadcast from bank 0. The two values would be read into registers, operations are not performed in any memory other than registers. The code looks like its requesting two values from shared memory at once but it's not. The PTX would show that the two reads are separate instructions.
Great video! Please continue with this series.
It's very informative and a good intro to CUDA programming. Thanks very much!
Thanks for watching, waiting and commenting!
Great presentation on GPU architecture, performance tradeoffs and considerations.
This is really helpful for my computing. Thank you.
Thank you so much, glad you liked it!! 😃
I'm in love now with your Computer Specs, I9-12900K with RTX 3090, damn, that's absolutely a beast PC.
Thank you so much! Glad you liked it! 😃
Thank you for response!!! i am also a parvenu in robotics))) But i love opencv and PCL looks promising(i'd been working on pcl, but without the MS library) ROS guys use python a lot! So, take a rest and look at the theme please!!!
Very nice introduction. Using additional software like GPU-Z while processing data (e.g: training neural network) you can check your GPU load and temperature. When processing big chunks of data for long time ( days) check that you GPU doesnt pass the maximum temperature joint parameter (in my case it was 100°C). Another thing is that for PCs it is really important to have good coolers and a notable spatial separation between CPU and GPU (GPU support better high temperatures than CPUs).
The software works with CUDA. I think their competitor that makes CST also uses video card memory for ultra fast calculations of insanely large matrices (inverting the matrix) .
Excellent detail! Thanks for the upload
As a graphics engineer, I recommend you use Optix if you do not need any rasterization feature. DirectX or Vulkan is way too verbose and complex for personal projects.
Just what I needed! Thanks!
awesome video i found it quite useful, there are two minors errors, first when you initialize the max (d_max) to 0 it will return this zero if all the numbers of the array are less than 0, it should be any number of the array, the second is on the kernel sould also initialize temp with a value in the array (temp = array[0]);
Thank you for posting this, it helps a lot!
This was really good. Thanks for posting this!
This is great ! Shall continue to support you on youtube. It was simple and actually more clear then any other tutorial.
GPU and CPU both are good in there way! Merry Christmas ☃️! Video is really good you did well!
Perfect Video! Saw was revealing to me to understand how it works. Thank you! I am a new subscriber of your channel. Regards from Buenos Aires, Argentina
Thank you so much!! Will do! 😃😃😃
А у нас сегодня 13 - холодно, холодная зима, страна Кипр, я тут уже два года живу. Для программиста ты очень красивая - прямо мечта гика :) Нашёл твой ролик про скрэпинг :) Ну далее уже подписался смотрю :)
That is not a computer, is a god-level beast. 3090 + i9 crazy!! Great video
Excellent video! thank you Mariya 🌷 ❤️
wanted to comment that the information in this presentation is very well structured and the flow is excellent.
Congrats on finishing the course!! 🥳🥳🥳 I hope you had lots of fun!!
Wow, that flicker is really really cool! :P Awesome tutorials, thank you for making these tutorials.
Ahmad Bazzi ! Thank so much! You're the best
Awesome explanation!! 👏🏼👏🏼👏🏼
Wow!!! Thank you for sharing.
Great tutorial! :)
Thank you Sarwar! I'll boost the volume on future tutorials, thanks for letting me know 😃
Thanks a lot really got me started .
Looking forward to CUDA 13.0 🚀😍
I might do an OpenCL vs CUDA speed test in the future, sounds like a fun project!
VERY helpful, thank you!!!!
Nice video for begginers, i will point my students to your channel.
All I ever wanted to know about CUDA striding and kernels.
Merry Christmas Ali!! You too! 😀😀😀
We will not only use CUDA, but we will also use something called TensorRT - which is accelerating the prediction process in particular.
11:00 you are watching a Master at work
Great tutorial as usual thanks!
Hyper V + Remote FX + Cuda Server = the perfection, u .u
But if you have a lot of tasks that depend on eachother finishing in sequence or a complicated code with a lot of branching etc or if you want the hardware to work out what parts of the code that can be run in parallel for you then your code will run faster on a CPU then a GPU.
Amiga 1983 yes.
Hey, thanks for explanation! Very well done 👍 I am downloading CUDA 💪
Happy new year and see you soon in a brand new tutorial! 🥳🥂🎆
Very informative and visuals helped me comprehend. I didn't purchase a GPU card last time due to fear of Ubuntu compatability. I'll get a moderate card in upcoming days, I think I got a good power source too but gotta check
so thank you beast ,, thank you M .... I'm following what you present .. as always..
Thanks for watching and have a great day!
You're right that with x86 we're looking at 1 or 2 threads pr core.
You explain it very well
Thanks for letting me know! I got a bunch of folks reporting the same issue, I'll check if I can turn off RUclips's involvement in the comments :)
Thank you so much Ahmad! 😀
Could you share a video regarding the implementation of an image processing algorithm????????????????
gpu agnostic computing acceleration, ie, writing the software computing shaders on hardware shader execution in gpus
I am also going to try to get this to work on my all AMD desktop computer.
Would you like to make a video on building or creating a Single node level task scheduling for deep learning based RLScheduler in spark cluster?
Happy new year and see you in 2022!!! 🥳🎆❄
I'll post an equivalent OpenCL codealong soon! It's very similar to CUDA but it works for all GPUs - not just Nvidias! 😉
It sounds complex, but it actually involves only a few lines of code! (as some very smart folks have already built this model for us and already trained it on the CIFAR dataset... we're just loading it and using it for our own stuff 😉)
What a fabulous girl and how smart you are !
(I might actually film a tutorial on it as well in the future... I want everyone to enjoy this series, not just the folks with Nvidia GPUs)
Great tutorial! 💪🏻
Well just built a new rig with a 980ti and a 4790k so I'm gonna put that to test. Thank you for your wonderful explanation :D
Excellent explanation, keep going with this content man ;)
Thanx, Ahmad! Very informative information ))) Could i vote on a theme? Using Python at the robotics (such as ROS) seems useful to dig into...
Thank you K.Ballaji Axe! 😀
Thanks for sharing this.
You need to put some soft material under the keyboard or re-allocate the mic. The noise when you type is similar to good drum and bass music.
P.S. You have inspired to the run my multiprocessing test on my new 5800XT with 12 cores. I haven't tried it over there yet.
thank you. good video!!! it was very helpful
You're absolutley welcome! 😃
Love nvidia jetson orin ❤️
It's true enough for this explaination.
Thanks for the great work!
nice to hear that!
Excellent stuff.
Thank you so much! 😃
That is one beefy computer. The GPU alone is like $2500 right now 1/1/22. Poggers.
Of course in normal spagetti code you can't really expect the CPU to be able to do particularly many of those instructions at once.
Thank you! 😃
Please make videos on image processing using Cuda !!
Very helpful, thank you.