Algorithms in CUDA: dot product

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 8

  • @салаватишбулатов

    Thanks for video

  • @StratosFair
    @StratosFair 3 года назад

    thanks for the video, it was helpful :)

  • @hermitibis1811
    @hermitibis1811 3 года назад

    Thanks from covid era, this video helps :D

  • @rdavid5904
    @rdavid5904 Год назад

    Hi, Can you help me with one question: Why do you get the same results every time you run the program?

    • @jamessandham1941
      @jamessandham1941  Год назад

      Everything in the kernel is deterministic except the atomicAdd. Depending on the order in which the atomicAdds are performed across all thread blocks, you may get slightly different floating point results.

  • @IgorAherne
    @IgorAherne 6 лет назад

    Hey James, thanks for the video
    at 9:20 you showed that performance of a kernel is 5-6 times faster than the cpu version.
    However, the cpu version is single threaded. Would you say CPU version will be equivalent if we make it run on several threads? For example, my 8-threaded CPU. I think it would actually win over the GPU
    thanks!

    • @jamessandham1941
      @jamessandham1941  6 лет назад +2

      I have not actually implemented a CPU version so I can't say for sure but I suspect that you could make it just as fast (or maybe even faster?) using OpenMP or equivalent. This code was for demonstration purposes. You wouldn't really compute just a dot product with the GPU as there wouldn't be enough computation to saturate the GPU's potential. What you might do however is solve a more complicated algorithm that itself involves computing dot products as one of its steps.