Really enjoyed watching the vid, I've been learning computer architecture with nand2tetris and Digital Design and Computer Architecture by David Harris (Author), Sarah Harris (Author). I'm so happy to be able to understand the concepts he was talking about in this vid. Anyway thank you for the easy-for-beginner excellent content.
Hi Tom, at 16:36, on line 19, you should fix the "float(i);" to "(float) i;" I'm assuming you're trying to cast the integer value to a floating point data type.
Why did you need to use "float f" at time index 30:00 - why didn't you combine everything into 1 line of: "d_out[idx] = d_in[threadIdx.x] * d_in[threadIdx.x]" ? Is there a penalty for reading the thread index multiple times - or you did it just for clarity and explaining how the code works?
How do you ensure that the threadID does not go out of bounds of the array? I could have 1000 threads right? But only have 60 elements in array to square.
among all the cuda videos I ve watched this one made the most sense to me
true
It is like impossible power of computation! Beautiful beast!
Amazing lecture. Helped me a loooooot for my final exam. Thank u soooo much. ❤️❤️❤️
Amazing info! Love the way the data flow and execution is explained!
Really enjoyed watching the vid, I've been learning computer architecture with nand2tetris and Digital Design and Computer Architecture by David Harris (Author), Sarah Harris (Author). I'm so happy to be able to understand the concepts he was talking about in this vid. Anyway thank you for the easy-for-beginner excellent content.
This is very good video explanation about GPU computation
Great lecture thanks for sharing! Thanks for sharing an interesting piece of history on how "bug" concept came to be
best cuda explanation ever
Cheers mate! Always love a good programming lecture. :)
Great Lecture! Very helpful!
Great tutorial. Thank you !
Excellent introduktion! Thanks!
Hi Tom, at 16:36, on line 19, you should fix the "float(i);" to "(float) i;" I'm assuming you're trying to cast the integer value to a floating point data type.
Why did you need to use "float f" at time index 30:00 - why didn't you combine everything into 1 line of: "d_out[idx] = d_in[threadIdx.x] * d_in[threadIdx.x]" ? Is there a penalty for reading the thread index multiple times - or you did it just for clarity and explaining how the code works?
How do you ensure that the threadID does not go out of bounds of the array? I could have 1000 threads right? But only have 60 elements in array to square.
you pass the arraysize along with thread amount to the kernal e.g. square < < < 1, arraySize > > > ensres only 64 threads are created
Very neat!Thank you!
Thank you so much for the video! Quite helpful. Appreciate it :D
Could you have squared the d_in array in place? So d_in[idx] = d_in[idx] * d_in[idx]
Can you tell me what threads mean ? because I'm new to the GPU world😁
15:20 Single Instruction Multiple Threads
You could add timestamps
Great explanation! Thy
*thx not thy
nice boy
Amazing !!
Great tutorial! Thank you so much!