Hi, im from Argentina. I would love to see the next episode of this series! Im new to the CUDA arquitecture, so im getting started with this videos and the book CUDA by example. Thanks a lot for this videos! Any book you recommend for me? Saludos!
Hi, Nick; amazing tutorial; I was looking for something like this; get down to coding right away. nvprof is no longer used with devices having more than 8.0 compute capability. To profile, one can use this, for example, "nsys profile --stats=true -t cuda ./vector_add_unified_memory.out. Also, will there be version 2 of your previous sum reduction videos? Thanks a ton for sharing
quick question Sir, if the size of the memory is bigger then the GPU max size, what will happen ? For instance you have like 3 GB of data allocated on Host how does the unified memory deal with it ? will it send it patch by patch or what ?
Good question - It somewhat depends on the GPU arch you're working with. For some, you'll be limited by the max capacity of the GPU (i.e., you can't reserve a unified memory region larger than what is available on the GPU). Newer GPUs (pascal and later and only on Linux) support memory oversubscription where you can reserve more than what is available on the GPU (and data will be paged back and forth as needed).
Hi , thanks so much for all the videos which have been very helpful. If you don't mind , I have a question . For some reason my device has significantly more counts than yours even when using the prefetch code. Do you know what might be the issue? ==25060== Unified Memory profiling result: Device "GeForce RTX 2080 Ti (0)" Count Avg Size Min Size Max Size Total Size Total Time Name 16 32.000KB 32.000KB 32.000KB 512.0000KB 292.5000us Host To Device 36 35.555KB 32.000KB 128.00KB 1.250000MB 6.072400ms Device To Host
Different GPUs and driver versions will likely behave differently. I had to play around with my hints to get the results I did in the video. Playing around with the hints on your system will likely be your best bet. Hope this helps!
Lovely tutorial. Far superior than sifting through all the spagetti that is the CUDA documentation.
These are some high quality tutorials! I really hope you’ll get more traffic to your channel soon!
Glad you are enjoying them!
This series of tutorials is very helpful for me, a novice who is just learning cuda! keep it up!
Love this high quality tutorial. You have saved my time thanks
Thanks for the cuda videos! they are very helpful!
Thank you. This is a high-quality tutorial. I have checked our channel and it is great. keep it up
Thanks! Glad you have enjoyed the content!
Hi, im from Argentina. I would love to see the next episode of this series!
Im new to the CUDA arquitecture, so im getting started with this videos and the book CUDA by example.
Thanks a lot for this videos!
Any book you recommend for me?
Saludos!
Great video. I learned a lot. Thank-you sir.
Hi, Nick; amazing tutorial; I was looking for something like this; get down to coding right away.
nvprof is no longer used with devices having more than 8.0 compute capability. To profile, one can use this, for example, "nsys profile --stats=true -t cuda ./vector_add_unified_memory.out.
Also, will there be version 2 of your previous sum reduction videos? Thanks a ton for sharing
Please make more cuda videos
quick question Sir, if the size of the memory is bigger then the GPU max size, what will happen ? For instance you have like 3 GB of data allocated on Host how does the unified memory deal with it ? will it send it patch by patch or what ?
Good question - It somewhat depends on the GPU arch you're working with. For some, you'll be limited by the max capacity of the GPU (i.e., you can't reserve a unified memory region larger than what is available on the GPU). Newer GPUs (pascal and later and only on Linux) support memory oversubscription where you can reserve more than what is available on the GPU (and data will be paged back and forth as needed).
@@NotesByNick Thank you for you quick replay Sir , BTW good videos hope to see more.
Thank you!
Happy to help!
Hi , thanks so much for all the videos which have been very helpful. If you don't mind , I have a question . For some reason my device has significantly more counts than yours even when using the prefetch code. Do you know what might be the issue?
==25060== Unified Memory profiling result:
Device "GeForce RTX 2080 Ti (0)"
Count Avg Size Min Size Max Size Total Size Total Time Name
16 32.000KB 32.000KB 32.000KB 512.0000KB 292.5000us Host To Device
36 35.555KB 32.000KB 128.00KB 1.250000MB 6.072400ms Device To Host
Different GPUs and driver versions will likely behave differently. I had to play around with my hints to get the results I did in the video. Playing around with the hints on your system will likely be your best bet. Hope this helps!
@@NotesByNick Hi thanks for the reply , I believe its because i'm on windows at the prefetch commands don't work there hahaha
Ah, makes sense! Haha