C++ Crash Course: Intro to SIMD Intrinsics

Поделиться
HTML-код
  • Опубликовано: 10 окт 2024
  • In this video we look at a basic use of SIMD Intrinsics (AVX) in C++!
    For code samples: github.com/coff...
    For live content: / coffeebeforearch

Комментарии • 21

  • @shimblywimbles158
    @shimblywimbles158 3 года назад +24

    It's so hard to find concise, direct info on SIMD intrinsics. Thanks for this!

  • @thomasbenardo7511
    @thomasbenardo7511 3 года назад +6

    This is the best video on SIMD. Short concise and to the point. Everyone else i found was just blabbering stuff

  • @MindGameArcade
    @MindGameArcade 2 года назад +3

    Excellent content, thanks!

  • @motbus3
    @motbus3 3 года назад +2

    well well. looks like YT finally sent me a c++ channel it's worth watching

    • @preethamdbz2023
      @preethamdbz2023 3 года назад +1

      This guy 🔥 and The cherno channel🔥

    • @poetryflynn3712
      @poetryflynn3712 3 месяца назад

      @@preethamdbz2023 The cherno is crap outside of the C++ series.

  • @michaelmorris2300
    @michaelmorris2300 2 года назад

    Straight to the point, clear and precise.

  • @ASD9344
    @ASD9344 4 года назад +1

    thank you for nice explanation. I have following questions:
    1. how to optimize euclidean distance function using SIMD?
    2. How to implement SIMD instructions in java?

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  4 года назад +1

      1.) You could just do multiple distances at the same time using intrinsics. There are ones for add, subtract, multiply, square root, etc. I had a project where I had to calculate millions of manhattan distances so I just offloaded it to the GPU.
      2.) I don't know if there's a way to use SIMD intrinsics in Java. I have never used Java, but from a brief google search, there seems to be no way to easily do this.

    • @ASD9344
      @ASD9344 4 года назад

      ​@@CoffeeBeforeArch many thanks. euclidean distance is just another form of manhattan distance with minor changes. It would be great if you upload manhattan distance project on github or somewhere. I shall try to modify it for euclidean distance.
      If I can do it in java or c++, no problem. By the way, I have found nd4j library for java. Nd4j makes extensive use of vectorized c++ code for all numerical operations (utilizing JavaCPP).
      What do you suggest about that?

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  4 года назад +1

      @@ASD9344 Yep, I've used both manhattan distance and euclidean in research work, so I'm familiar. This was the short CUDA app I wrote for calculating manhattan distance on the GPU. github.com/CoffeeBeforeArch/research_utilities/blob/master/acceleration/m_distance.cu . I really have no suggestion about something Java related because I have never used Java. If that library works for you, go ahead and use it.

  • @MA-nx3xj
    @MA-nx3xj 4 года назад +2

    Nice, thanks!!

  • @hericpan5442
    @hericpan5442 2 года назад

    Thank you!

  • @TheJsow
    @TheJsow 3 года назад +1

    hi, I was just wondering at what times/conditions that the compiler is not smart enough to automatically use SIMD intrinsics, which forces us to manually write them on our own instead?

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  3 года назад +1

      For GCC, the auto-vectorizer kicks in at the -O3 optimization level, or if you manually enable it with -ftree-vectorize. There are cases where your compiler may not perform vectorization (e.g., if there's an alignment or aliasing problem). Furthermore, it seems some SIMD instructions are just not produced by compilers (likely due to the high-effort in matching high-level code to them and niche use-cases). The dot-product intrinsic seems to be an example of this (I've yet to have a compiler produce it for me, and I've always had to use the intrinsic).
      Cheers,
      --Nick

    • @TheJsow
      @TheJsow 3 года назад +1

      @@CoffeeBeforeArch thanks for the detailed reply!

  • @shavais33
    @shavais33 4 года назад

    Thanks so much for this very helpful explanation.
    What happens if you use intrinsics, but it turns out that the processor the user of your app has doesn't support them?
    Is there a way to detect at run time what sort of hardware support exists?
    Is there a reference somewhere that will help us map detected hardware elements to support for particular intrinsics?

    • @CoffeeBeforeArch
      @CoffeeBeforeArch  4 года назад +1

      If your processor does not support the intrinsic, it will generate an invalid opcode exception (#UD). How you check this at runtime will differ based on the OS. I think all those utilities for checking what is supported is in cpuid.h. Intel has software manuals that specify what instructions/intrinsics are supported on what processors.
      Cheers,
      --Nick