What Are SIMD Instructions? (With a Code Example) [DSP #14]

Поделиться
HTML-код
  • Опубликовано: 27 ноя 2024
  • ХоббиХобби

Комментарии • 21

  • @WolfSoundAudio
    @WolfSoundAudio  2 года назад +2

    Have I helped you with this video? If yes, please, consider buying me a ☕ coffee at www.buymeacoffee.com/janwilczek
    Thanks! 🙂

  • @auditiv0276
    @auditiv0276 Месяц назад

    If you want to make sure to compile using SIMD instructions specific for the HostCPU you can use llvm bindings for the language of your choice and then compile through llvm. Interesting vid!

  • @chen-kim9440
    @chen-kim9440 6 месяцев назад +1

    Thanks for your great introduction and lively demo! I really like your pace!

  • @niranjanm5942
    @niranjanm5942 Год назад +2

    Thanks this was great intro on this topic. I wanted to get started on SIMD and this will put me in right way

  • @moliver_xxii
    @moliver_xxii 2 года назад +2

    hej, to jest trudny temat, nic nie można znaleść na Internet, cieli dziękuję ci Jan!

    • @WolfSoundAudio
      @WolfSoundAudio  2 года назад +2

      Bardzo się cieszę, dzięki również!

  • @alldyallnite
    @alldyallnite 2 года назад +1

    Thank you Jan!

  • @cliffmathew
    @cliffmathew 2 года назад

    Great job explaining, and demonstrating. Thank you.

  • @theruisu21
    @theruisu21 Год назад

    great video!. looking forward the next one. for the next time, could include more on the arm and risc v case?

  • @NecdetSanli
    @NecdetSanli Год назад

    You made the concept easy to understand, thank you. Would like to see some C examples if it's possible too.

  • @KeypleezerOfficial
    @KeypleezerOfficial Год назад

    Nice video & nicely paced and clear. Just what I needed to get this topic a bit more. Just need some more examples of calculations actually taken care of by the SIMD extension sets, and perhaps some alternative SIMD/FFT libraries with info about what does what and how, that would be epic. Not many people teaching this in audio with such good phrasing! Keep up the great work! 👍

    • @KeypleezerOfficial
      @KeypleezerOfficial Год назад

      I didn´t read the article about this topic you wrote before. It is great, much more info there giving more depth, thanks!

  • @ifnullreturn1
    @ifnullreturn1 Год назад +3

    Line 13 is killing me lol

  • @davidminnix
    @davidminnix Месяц назад

    many dsp algorithms contain single sample feedback. can anything be done to vectorize these algorithms? It seems like the feedback complicates any attempt to use block processing to vectorize.

  • @moisascholar
    @moisascholar 11 месяцев назад

    Very helpful video. I was working on a particle system/simulation, and I use GL to draw the particles. Was wondering with SIMD and GL, how can I draw multiple particles at once? Or is this something more to do with GL buffers?

  • @BalakrishnanIrudhayaraman
    @BalakrishnanIrudhayaraman Год назад

    I can understand the concept of simd. But, in the code I can see that you are adding each value when it is added to the register. I see that which is equivalent to scalar addition, I think inorder to avoid one more for loop to store the addition values into the result array which makes sense. This points me to ask whether the intrinsic function performs the addition, only when all the 256bits are filled with values or it can also perform otherwise?

  • @przekladanki
    @przekladanki 2 года назад +2

    Yes, you helped a lot ^_^

  • @omnisepher
    @omnisepher Год назад

    Great job,
    but didn't second for-loop killed the entire reason of using SIMD?

    • @corporalwill123
      @corporalwill123 4 месяца назад

      Late reply, you probably already have figured it out by now. Responding anyway for others with the same question.
      That's like saying planes are pointless for traveling large distances, because you still need to walk the short distance to your destination from the airport.
      SIMD will do a large portion of the work, in this case it will do it in multiples of 8, and the regular loop will finish the remaining amount
      so for normal loop you are looking at:
      N*scalar
      while for SIMD you are getting:
      floor(N/8)*SIMD + (N%8)*scalar
      Since by design 1*SIMD will be faster than 8*scalar, for sizes greater or equal to 8, the second algorithm will be faster than just doing the first loop. Otherwise, for sizes smaller than 8, it will be the same as the first loop + some overhead because of the division by 8.