AVX Explained - Performance and Syntax Analysis

Поделиться
HTML-код
  • Опубликовано: 10 окт 2024
  • // Join the Community Discord! ► / discord
    // Hotrate Content! ► www.hotrate.co...
    The Advanced Vector Extension, A.K.A. AVX, is an extension to the x86 instruction set architecture, designed to make SIMD possible within the CPU core itself!
    Building a Budget PC can be tough. Not only are GPUs and CPUs so incredibly expensive, but they can be hard to find on a budget... But, there are tips and tricks to finding you your dream Budget GPU, and pairing it with a CPU that will give you the performance you want!
    Also, if you're reading this far - I've got an i7-11700k review coming!
    Have a Great Day!
    Proceu
    #AVX #Intel #AlderLake

Комментарии • 66

  • @salvageddoor
    @salvageddoor 2 года назад +8

    Damn this should've got more views... I was just searching for AVX offset feature on RUclips and this video just came by. I'm not familiar with any kind of image processing in C++, I'm more into embedded stuff but your content has served me quite a lot of interesting knowledge! Keep it up and your channel will blow up really soon!

    • @LegendLength
      @LegendLength Год назад

      First hit for me for avx2. Great video too.

  • @parad0x1cal83
    @parad0x1cal83 2 года назад +12

    With this level of content, it's a matter of time before your channel blows up! Thank you for the explanation!

  • @ItsAkile
    @ItsAkile 2 года назад +3

    This video has been in my browser for about a month+, finally watched it. pretty dank video, thanks brother I'm still getting into the groove

    • @ProceuTech
      @ProceuTech  2 года назад +3

      Glad you enjoyed! It’s admittedly a pretty niche programming concept.

    • @ItsAkile
      @ItsAkile 2 года назад +1

      @@ProceuTech That it is, I had it on the list of things I dont fully understand

  • @charlieike8414
    @charlieike8414 6 месяцев назад +2

    I'm taking C++ in college and we're currently learning about arrays. Only looked this video up after an LTT video where they turned AVX off in BIOS to mess up the pc. Fate brought me here to maybe ignite a deeper passion for programming.

    • @ProceuTech
      @ProceuTech  6 месяцев назад

      The other Avx video I made more recently is a much better video than this one if you want better info- appreciate the support tho!

  • @anonymouscommentator
    @anonymouscommentator 2 года назад +2

    Amazing video! I was interested in what AVX512 (and AVX2 in general) actually are and i found your great video explaining more than i hoped for!

  • @treelibrarian7618
    @treelibrarian7618 Год назад +3

    I thought it would be worth noting that just because AVX512 instructions work on 16 floats in one instruction doesn't make them faster than AVX2 instructions in practice, since as far as I know, in desktop and laptop CPU's the avx512 instructions are limited to a single execution port in the CPU, whereas the AVX2 instructions can execute on 2 ports simultaneously (duplication of the fast add and FMA capabilities) and for simpler 256-bit vector ALU functions like and, xor, blend and integer arithmetic, there's 3 ports they can execute through for 3 instructions per clock. The biggest benefit of the AVX512 instruction set seems to be versatility, with selective operation on partial vectors via the k-registers. I believe sapphire-rapids server and workstation CPU's have 2 AVX512 execution ports though. zen4 does avx512 instructions in 2 clocks, putting each half through the same 256-bit pipeline in turn.

  • @clikclikboom8144
    @clikclikboom8144 Месяц назад +1

    Ahead of the curve on the 1.5v volt thing with intel chips. Great video, I don't have any usecase for AVX in my projects but it's interesting learn how to it actually works.

    • @ProceuTech
      @ProceuTech  Месяц назад +1

      My more recent AVX512 video does it a lot better and more in depth if you’re interested! 🤓

    • @clikclikboom8144
      @clikclikboom8144 Месяц назад +1

      @@ProceuTech Just finished, thank you for making these videos!

  • @ponchobob
    @ponchobob 5 месяцев назад +2

    @7:33 did I miss something? returning pointers of local variables is unsafe and leads to unpredictable behavior of the program.

    • @ProceuTech
      @ProceuTech  5 месяцев назад

      Yeah this video is honestly not great- check out my more recent AVx video if you want to look into the syntax more deeply! I explain it way better and without any of these goofy mistakes on my end

  • @realforest
    @realforest 2 года назад +1

    Your explanation at the end was very helpful!
    Me: "Why the hell would I ever use AVX instructions?"
    AVX: "Umm, you can skip an extra loop to transverse a vector, giving you a lot of performance if you do a lot of vector arithmetic!"

  • @anshumandhuliya
    @anshumandhuliya Год назад +1

    Very nice and gentle introduction to the topic :)

  • @ohmygosh6176
    @ohmygosh6176 3 месяца назад +1

    Update. Any AMD Zen 4 and up has AVX512 support. The game held diverse 2 uses AVX

  • @salvageddoor
    @salvageddoor 2 года назад +7

    Just one small question: How can you return the array ret[16] in the function linear::vector_add()? It's a local variable so how can it be returned? Or am I missing something that is possible in C++?

    • @ProceuTech
      @ProceuTech  2 года назад +2

      Let me do some coding real quickly and do some tests. I’ll get back to you in a few minutes!

    • @ProceuTech
      @ProceuTech  2 года назад +4

      Ok so I just reran the function in order to see what was actually going on in the array. Turns out it wasn’t returning proper values! Thanks for catching that! I’m so used to working with vectors (which can be returned), and don’t have as much experience with arrays.
      Sorry for the confusion!

    • @salvageddoor
      @salvageddoor 2 года назад +2

      @@ProceuTech Thanks for clearing up my doubt! At least I know that it is possible to return a local vector in C++.

    • @ProceuTech
      @ProceuTech  2 года назад +3

      Vectors can still be processed using AVX aswell- you just have to use “_mm512_set_ps(i[0], i[1], etc., i[15]);”, which takes up more space in your program but offers identical performance!

    • @vytah
      @vytah 2 года назад

      @@ProceuTech With std::vectors, you can just use i.data(), which is the pointer to the internal array. As for returning AVX values, you can just return __m512 directly, or populate a std::vector via data() and return it.

  • @mkvalor
    @mkvalor 2 года назад +2

    I know, I'm adding to this comment section nearly two years later BUT... AVX-512 was almost certainly more than 77.5% faster than scalar. The values for the arrays were read "cold" from RAM for the AVX-512 function call, but the memory reads for that operation placed those values in the L1 data cache for the scalar loop. Benchmarking is HARD!

    • @ProceuTech
      @ProceuTech  2 года назад

      Is there an explanation as to why?

    • @mkvalor
      @mkvalor 2 года назад

      @@ProceuTech The first program to load a file from disk pays a time penalty for the disk I/O operations; however, the OS then keeps as much of that file in the system RAM as possible and some of the file even resides within the fast cache of the CPU itself. The next program you run which needs to read that file will retrieve the data very quickly from the CPU cache and system RAM. So that second program doesn't pay the same time penalty for disk I/O operations.

    • @lupsik1
      @lupsik1 Год назад +1

      @@mkvalor I have a problem understanding what you mean by the loading from disk. When the program is loaded those values are going straight into RAM. When the variables got initialised they get pushed onto the stack. When the AVX function is called all that happens is the address of a gets copied into the RAX register, and the address of b gets copied into RDX. The exact same thing happens when we run the linear function.
      Are you suggesting that the page containing this tiny program gets unloaded mid-execution?

    • @treelibrarian7618
      @treelibrarian7618 Год назад

      For sure there's a lot wrong with the test. first, the input data is unchanging, so the compiler should optimize out the loop entirely, or maybe just the memory reads. but this would also almost completely invalidate your argument about caching - which would be valid if the test actually had a significant volume of data and was storing the result somewhere. As someone already noted, though, the compiler may well have used vector instructions for the simple loop as well - more likely with clang I think - giving the somewhat poor showing of 70% speedup. It would all depend on compiler flags for optimization level and target architecture. If it didn't optimize everything well, then there may instead be a whole lot of overhead from the function calls and extra memory reads/writes involved. If I were to write assembler code to do what is presented in this test (on multiple data) it wouldn't take 80µs on a 5Ghz CPU. afaik these CPU's are capable of 2 reads 1 add and 1 write per clock, even at 512bit, so the whole process should take < 1µs with avx512 instructions. even with scalar instructions (which still execute on the vector alu, just through 1 channel) it should have happened in 15µs - slowed from 8µs only by the scalar reads of memory.
      to get a more reliable result, probably a significant chunk of data, and >1000000 iterations would be needed - and likely 100's of repetitions of the whole process to account for variations of CPU load, clock frequency (OS usually keeps clocks low till something starts happening - but it takes a few ms for it to respond), interrupting operations etc. and check the disassembly to be sure of what is being executed.

    • @ProceuTech
      @ProceuTech  Год назад

      @treelibrarian7618 I made an updated video with this information in it; the tests done in this video were flawed

  • @subbastionbastion2167
    @subbastionbastion2167 6 месяцев назад +2

    Sorry sounds like cuda would be way faster and you can have thousands of threads at once running at the same time in higher chunks of data

    • @ProceuTech
      @ProceuTech  6 месяцев назад +1

      I've also got a video exploring CUDA and it's syntax- it's much more well put together in my opinion than this video! :)

  • @RoboticusMusic
    @RoboticusMusic 11 месяцев назад +2

    I came here because I vaguely remember someone mentioning something that can cause a CPU to overheat insanely fast. Is there something else that can overheat a CPU even faster, or was this it?

    • @sean8102
      @sean8102 3 месяца назад +1

      Well AVX is very demanding, so the CPU uses a lot of power when executing AVX heavy instructions. And of course more power = more heat. Burn in apps like Prime95 I believe use or have the option to use AVX/AVX512 during the burn in test to really push the CPU as hard as possible. As for causing a CPU to overheat. Not it should not do that if you have a stable setup.

  • @dagoberttrump9290
    @dagoberttrump9290 6 месяцев назад +1

    what happens if you align the simd processed vector to cacheline boundaries?

  • @opoxious1592
    @opoxious1592 7 месяцев назад +1

    Up to this day, i have never seen a real benefit of a game that needed avx instructions.
    A good example is Cyberpunk 2077.
    In the very beginning it would only run with cpu's with avx support.
    And a few months later they were also made the game run without avx support.
    There is not a single bit of difference with or without avx regarding graphics or performance in fps.
    It's a good thing, that more and more games do not require avx anymore, due to the fact that it asks for more resources and energy of your system without any visible gain in performance

  • @Quancept
    @Quancept Год назад +1

    Very underrated video!

  • @vinstontan9502
    @vinstontan9502 Год назад

    Excellent video! Effectively explains AVX

  • @Antagon666
    @Antagon666 2 года назад +3

    You should make sure your memory is aligned to 32 bits when using load function.
    Also chances are, the non avx version got auto-vectorized by the compiler to use avx/2.

  • @KristianDjukic
    @KristianDjukic Год назад

    thx for excelent video !

  • @LegendLength
    @LegendLength Год назад

    How important is volatile when coding with AVX?

    • @treelibrarian7618
      @treelibrarian7618 Год назад +1

      no more than normal. Volatile is for when something (like another thread) might possibly modify the memory of a variable without the knowledge of the current thread, so the compiler should treat it as a volatile (subject to unpredictable change) value and re-read it whenever it needs to use it, and not assume it's value will stay the same if it hasn't changed it which prevents certain compiler optimisations that would assume the value is unchanging. AVX memory reads and writes happen in a single cycle like normal register reads and writes so there's no real difference.
      should also be noted there are no "locked" versions of AVX instructions, so if you are trying to operate on vector data with multiple threads, you should work out some other way to prevent race conditions, like data segmentation or mutexes (preferably with lock elision since the hardware memory synchronization involved in locks/mutexes is quite slow)

  • @MrMonkeyZMemeZ
    @MrMonkeyZMemeZ 2 года назад

    I too am a fan of AVX

  • @Psythik
    @Psythik 2 месяца назад +1

    Interesting video, but I only understood about 25% of it. What's a "vector element"? "Scalar looping function"? "ISA"? Last time I heard the term "ISA" was in the early 90s, and it referred to a slot on the motherboard. "Segfault"? "bFloat extension"? This is a lot of terms to learn for a layman like me who just wants to know what AVX-512 is, and why only AMD CPUs have it now. By the time you got to the C++ part of the video, I had to stop watching entirely cause anything coding-related all goes right over my head.

    • @ProceuTech
      @ProceuTech  2 месяца назад +2

      My more recent AVX video goes over it in much more detail and explain it in a way that’s more manageable for new comers, at least I personally feel. I put more effort into the more recent video specifically to help explain what it is and expand it to a greater audience.

    • @Psythik
      @Psythik 2 месяца назад +1

      @@ProceuTech I'll check it out; thanks.

  • @naveediqbal5600
    @naveediqbal5600 2 года назад

    is there a way to remove AVX instruction from a game

    • @ProceuTech
      @ProceuTech  2 года назад

      Some implementations have a toggle where you can switch between AVX and “Non-AVX” algorithms. Not all of them have this though :(

  • @juanme555
    @juanme555 2 года назад

    i7 10700F vs 11700F , which one is better at AVX512 ???

    • @ProceuTech
      @ProceuTech  2 года назад +1

      The 11700F! The 10700F only features AVX2!

    • @juanme555
      @juanme555 2 года назад

      @@ProceuTech
      Is 11700F the same as 11700? I know the F doesnt have iGPU , but does the iGPU help with AVX512???

    • @ProceuTech
      @ProceuTech  2 года назад +1

      No, AVX-512 units are in the CPU cores themselves!

  • @SystemCrasher113
    @SystemCrasher113 6 месяцев назад

    I still have no clue what avx does after you explained it in detail. 😂 Don't worry though, it's me, not you.

    • @ProceuTech
      @ProceuTech  5 месяцев назад

      I have another AVX video that goes more in depth as to what the instruction set entails, as well as a better guide on programming for it. Might be worth a watch if you’re confused!

  • @MrRayopt
    @MrRayopt 4 месяца назад

    Where is the beginning tutorial ? This makes no sense

    • @ProceuTech
      @ProceuTech  4 месяца назад +1

      The more recent video I made about AVX512 (linked in the first few seconds of the video) explain the concept and program much better

  • @WilliamBrown-x1f
    @WilliamBrown-x1f Месяц назад

    Lemuel Point

  • @SandraYoung-w7o
    @SandraYoung-w7o 29 дней назад

    Waldo Pine

  • @SusanLewis-p6b
    @SusanLewis-p6b Месяц назад

    Aida Stream

  • @LaurynBody-y5l
    @LaurynBody-y5l 12 дней назад

    Edward Views

  • @youtubeshadowbannedmylasta2629

    avx hinders performance it can take things that used to work and by putting AVX into programs (even decades old) it now makes it so they no longer even launch.

    • @sean8102
      @sean8102 3 месяца назад +1

      As for AVX hindering performance. I'm not a programmer. My only guess is maybe the "AVX offset" a lot of motherboards have where it downclocks the CPU by some amount when using AVX (though I'm pretty sure that can be turned off on most motherboards). As for AVX being a problem because of compatibility, I guess if you have a really old CPU. On the latest Steam hardware survey (June 2024), 97% of steam users have a PC that support AVX. From what I understand Intel and AMD started shipping CPU's with AVX support in 2011.

  • @NihalSingh-ld2en
    @NihalSingh-ld2en Год назад

    these video are distraction

    • @sean8102
      @sean8102 3 месяца назад

      Then don't watch them?