C++ Weekly - Ep 433 - C++'s First New Floating Point Types in 40 Years!

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024
  • ☟☟ Awesome T-Shirts! Sponsors! Books! ☟☟
    Upcoming Workshop: Applied constexpr: The Power of Compile-Time Resources, C++ Under The Sea, October 10, 2024
    ► cppunderthesea...
    Episode details: github.com/lef...
    T-SHIRTS AVAILABLE!
    ► The best C++ T-Shirts anywhere! my-store-d16a2...
    WANT MORE JASON?
    ► My Training Classes: emptycrate.com/...
    ► Follow me on twitter: / lefticus
    SUPPORT THE CHANNEL
    ► Patreon: / lefticus
    ► Github Sponsors: github.com/spo...
    ► Paypal Donation: www.paypal.com...
    GET INVOLVED
    ► Video Idea List: github.com/lef...
    JASON'S BOOKS
    ► C++23 Best Practices
    Amazon Paperback: amzn.to/47MEAhj
    Leanpub Ebook: leanpub.com/cp...
    ► C++ Best Practices
    Amazon Paperback: amzn.to/3wpAU3Z
    Leanpub Ebook: leanpub.com/cp...
    JASON'S PUZZLE BOOKS
    ► Object Lifetime Puzzlers Book 1
    Amazon Paperback: amzn.to/3g6Ervj
    Leanpub Ebook: leanpub.com/ob...
    ► Object Lifetime Puzzlers Book 2
    Amazon Paperback: amzn.to/3whdUDU
    Leanpub Ebook: leanpub.com/ob...
    ► Object Lifetime Puzzlers Book 3
    Leanpub Ebook: leanpub.com/ob...
    ► Copy and Reference Puzzlers Book 1
    Amazon Paperback: amzn.to/3g7ZVb9
    Leanpub Ebook: leanpub.com/co...
    ► Copy and Reference Puzzlers Book 2
    Amazon Paperback: amzn.to/3X1LOIx
    Leanpub Ebook: leanpub.com/co...
    ► Copy and Reference Puzzlers Book 3
    Leanpub Ebook: leanpub.com/co...
    ► OpCode Puzzlers Book 1
    Amazon Paperback: amzn.to/3KCNJg6
    Leanpub Ebook: leanpub.com/op...
    RECOMMENDED BOOKS
    ► Bjarne Stroustrup's A Tour of C++ (now with C++20/23!): amzn.to/3X4Wypr
    AWESOME PROJECTS
    ► The C++ Starter Project - Gets you started with Best Practices Quickly - github.com/cpp...
    ► C++ Best Practices Forkable Coding Standards - github.com/cpp...
    O'Reilly VIDEOS
    ► Inheritance and Polymorphism in C++ - www.oreilly.co...
    ► Learning C++ Best Practices - www.oreilly.co...
  • НаукаНаука

Комментарии • 56

  • @gnolex86
    @gnolex86 3 месяца назад +94

    std::float16_t is known as half-float in graphics programming. It's a type used to store attributes that need dynamic range but don't need high precision, like colors in HDR. It is worth noting, however, that most CPUs don't have instructions to operate on them directly. Doing any arithmetic operations on them might require hidden conversion to a usable type, like float or double, and without hardware extension this conversion will be done directly in software, so it should be treated as a storage type in most cases. Modern CPUs with AVX-512 support half-floats directly but you have to explicitly enable that with a compiler option.
    std::bfloat16_t was invented to alleviate high cost of conversion of half-floats because conversion to float is a trivial copy of the bfloat to upper 16 bits of a float and zeroing-out the lower 16 bits, while converting from float to bfloat is truncation of lower 16 bits. This can be done trivially using 32-bit registers so it gained a lot of popularity.

    • @anon_y_mousse
      @anon_y_mousse 3 месяца назад +1

      Thanks for explaining how it works. I actually like this addition, for once, and will definitely copy that in my own language.

    • @Blue_KVR
      @Blue_KVR 3 месяца назад

      i didn't get how bfloat would help high cost of conversion to normal float. can you point me to some resource? thanks

    • @gnolex86
      @gnolex86 3 месяца назад +3

      @@Blue_KVR bfloat -> float conversion: you shift bfloat bits 16 places to the left. float -> bfloat conversion: you shift float bits 16 places to the right. That's basically it. Conversion between float and half-float requires either extended instructions or manually decoding sign, exponent and mantissa between the two by shifting and masking out individual parts.

  • @CryZe92
    @CryZe92 3 месяца назад +84

    I don't think it came across too clearly, but bfloat is just the top 16-bits of float32, which is often faster than float16 because if your hardware doesn't have 16-bit float operations, the bfloat can just use 32-bit float operations and it just discards the bottom 16-bit when storing to memory.

    • @mytech6779
      @mytech6779 3 месяца назад +3

      Why would you bother with 16b at all if your hardware doesn't support it? Most programming with 16b floats is to target specific 16b hardware. The easy zero fill and truncation for float32 convertion is mostly just a nice side effect for moving between the 16b specific hardware and CPU. Bfloat's main purpose is to maintain range while sacrificing precision. I've been out of the loop for a few years but there was a push for bfloat (and some very similar siblings) in graphics hardware accceleration.

    • @mytech6779
      @mytech6779 13 дней назад

      @@anonymousalexander6005 Yeah I can see the benefit of cramming twice the data into cache, I was only considering it from a processor efficiency standpoint. Like hardware that can do 16b operations in an AVX512 register.

  • @ohwow2074
    @ohwow2074 3 месяца назад +38

    Long double doesn't necessarily represent std::float128_t. The former is implementation defined. On some X86 compilers it's the outdated X87 80bit float type. On some of the others it's same as double.
    But std::float128_t is the quadruple precision type coming from the IEEE754. It's really not just a typedef for long double!!

  • @avramlevitter6150
    @avramlevitter6150 3 месяца назад +8

    A neat floating-point trick I've seen: if all of your numbers are between 1.0 and 2.0, you can sort them with a radix sort. Radix sort as it turns out is highly parallelizable.
    So if you have a list of confidences from 0.0 to 1.0, you can add 1.0 to them all, radix sort them, and then subtract 1.0.

    • @arthurmoore9488
      @arthurmoore9488 28 дней назад

      Neat. Can you point me to any more info on this?

  • @TheDoubleBee
    @TheDoubleBee 3 месяца назад +14

    The `long double` type is only 128-bits in size on platforms that support extended precision floating-point type, i.e. x86, of which only 80-bits are used. However, MSVC does not support it at all, so even on x86, `long double` is the same 64-bit double precision floating-point type as `double`.

  • @IndellableHatesHandles
    @IndellableHatesHandles 3 месяца назад +7

    It's always confused me whether I should use double or float. Maybe we should have a comparison of these data types' precision for certain applications (large game worlds, for example)

    • @BryceDixonDev
      @BryceDixonDev 3 месяца назад +1

      I actually had this come up for the first time as a "real world" issue when I was working at 343 shipping Halo Infinite: the world was too big and all the transformation matrices and functions had to get updated from float to double to prevent imprecision from causing vertex jitter.
      IMO, even though everyone still defaults to `float`, I've started defaulting to `double` unless I'm *explicitly* trying to fit more consecutive data within a cache line or something. We're no longer in an era where saving 4 bytes, even a few thousand times, will matter, but programs breaking down at the edge of their fundamental capabilities *does* cause issues. This is why Godot switched their engine from defaulting to 32 bit floats to 64 bit doubles a while back to match their default of 64 bit integers.
      Are there cases where having the smaller size matters? Absolutely, but you'll know it when you see it and shouldn't need someone else to point out that, say, doing matrix math on 2 billion floating point numbers all near 1.0 might benefit from using a smaller size type.

    • @mytech6779
      @mytech6779 3 месяца назад +5

      The practical difference is extremely common information.

  • @Bloodwyn1756Swagger
    @Bloodwyn1756Swagger 3 месяца назад +6

    I wished there was official support for 128 bit integer. I need that fairly frequently actually.

    • @romangeneral23
      @romangeneral23 3 месяца назад +3

      What you do on a daily basis that requires that ?

    • @ohwow2074
      @ohwow2074 3 месяца назад +2

      Use compiler built-in types. GCC has it. The C++ standard doesn't want to add it anytime soon.

    • @Bloodwyn1756Swagger
      @Bloodwyn1756Swagger 3 месяца назад

      @@romangeneral23 specialized quantization/compression of multi dimensional data points to 64 bit integers. It's very handy to have a bigger int type to do some of the operations with. At some point I wrote my own uint abstraction to make arbitrary big integers of static size work.

    • @Nobody1707
      @Nobody1707 3 месяца назад

      ​@@ohwow2074 "The C++ standard doesn't want to add it anytime soon." This isn't true. There's actually a proposal (P3140) to add std::(u)int_least128_t, and by proxy std::(u)int128_t on platforms that support a native 128-bit integer.

    • @Nobody1707
      @Nobody1707 3 месяца назад

      @@ohwow2074 There's actually a current proposal to add (u)int_least128_t and, by proxy, (u)int128_t.

  • @Sebanisu
    @Sebanisu 3 месяца назад +3

    This should of been there a while ago I think.

  • @Omnifarious0
    @Omnifarious0 3 месяца назад +3

    I appreciated this video. And I had no idea this change was being made.

  • @sibinrsl7659
    @sibinrsl7659 3 месяца назад +1

    How about can we use it in CUDA to memalloc usong sizeof(float 16😊)

  • @oracleoftroy
    @oracleoftroy 3 месяца назад +7

    long double is 80-bit I think, at least on x86.

    • @obinator9065
      @obinator9065 3 месяца назад +1

      on GCC, but not on clang (or msvc?) iirc

    • @TheDoubleBee
      @TheDoubleBee 3 месяца назад +3

      `long double` is implemented as extended precision floating-point type on x86 architecture, which uses 80-bits, but the type itself is 128-bits in memory size, however MSVC does not support it at all on any architecture

    • @oracleoftroy
      @oracleoftroy 3 месяца назад +1

      Hmm, I'll have to play around with it. It makes sense that memory-wise it needs to align to a larger size, but I could have sworn it worked in MSVC,... not that I ever used it except in test programs, and probably not for 10+ years at that.
      From my perspective, realizing it is 80-bit fpu math instead of 128-bit sse or similar seems important. I would expect the float128_t types to do full 128-bit floating point operations if they exist on a given platform, not just take up that much space for a lower precision operation.

    • @oracleoftroy
      @oracleoftroy 3 месяца назад

      Had a chance to play a little bit with it. Nothing exciting, just a quick sizeof test. MSVC and Clang on windows give sizeof 8 for long double, same as normal double, so TIL. I wonder if I just made a test program with the type, but never checked the size or anything.
      On Linux (under WSL2, though it shouldn't matter), its 16 for both. I didn't check what instructions it uses. GCC 14 on linux also has all the new types, where none of the other compilers/platforms I checked had it yet.
      Not sure when I'll get back to it, but I'd like to take a look at what instructions it generates for those types.

    • @TheDoubleBee
      @TheDoubleBee 3 месяца назад

      @@oracleoftroy I don't know of any platform that supports 128-bit floating point arithmetic - even usage of x86 extended precision arithmetic is completely non-standard, as you found out yourself. SSE and AVX, on the other hand, having 128- and 256-bit accumulators, respectively, has nothing to do with 128- and 256-bit FP arithmetic - these are SIMD instruction sets, which means that an SSE or AVX accumulator represents several values of one type (floating-point or integer) and you perform a single operation on all values with one instruction at the same time.

  • @maxbd2618
    @maxbd2618 3 месяца назад +1

    Waiting for fixed points to be added to the standard

  • @JoeBurnett
    @JoeBurnett 11 дней назад

    Good stuff! Thanks!

  • @NikorouKitsunerou
    @NikorouKitsunerou 3 месяца назад

    A wonderful addition but it remains to be seen how it actually stands up to C#'s Half type. So it's exciting to see if float16_t's implementation is correct otherwise maybe not so exciting after all.

  • @marcususa
    @marcususa 3 месяца назад

    I would like to get Opcode Puzzlers Book 1, but is there a sample puzzle from the book? I can't even make out what is on the cover of Book 1. I assume then that there is a Book 2 (out or coming out)? In that case, since you have many of these puzzle books, is there a way to get more than just one at a time? My big pet peeve with all this is colleges don't help students plan out courses, one after the other. They are only looking for that one "mp3" to purchase. I would prefer to pay for a collection (album) and know that the collection is there for reference for future use.

  • @tomkirbygreen
    @tomkirbygreen 3 месяца назад

    Thank you Jason. I have to admit bfloat was news to me! :-)

  • @jaycarlson2579
    @jaycarlson2579 2 месяца назад

    cool!

  • @GoatTheGoat
    @GoatTheGoat 3 месяца назад +1

    What if I want to use Intel 80 bit float format?

    • @mytech6779
      @mytech6779 3 месяца назад

      The 8087 coproc generally only used 80bits internally (including it's own allocation of memory used for temp storage) too avoid significant didgit rounding errors, the inputs and final outputs were 64 bit.

    • @GoatTheGoat
      @GoatTheGoat 3 месяца назад

      ​@@mytech6779 Yes, intel intended the 80 bit format for intermediate double float (64 bit) calculations. But it is possible to load and store the whole 80 bits. So people use it as a native format.

  • @tlacmen
    @tlacmen 3 месяца назад

    Can you explain this sentence: "the fixed width floating-point types must be aliases to extended floating-point types (not float / double / long double). " please? What are extended types?

    • @sjswitzer1
      @sjswitzer1 3 месяца назад +5

      They are aliases to compiler-reserved types that are distinct from any of the legacy types for overloading purposes. This is helpful to avoid problems where, for instance, int32_t might or might not be an alias of int so that you have no portable way of knowing whether you can (or should!) overload on both int and int32_t.

  • @X_Baron
    @X_Baron 3 месяца назад

    Is bfloat16 covered at all by the floating point standard? It seems that its rounding behavior varies between implementations.

    • @FryGuy1013
      @FryGuy1013 3 месяца назад +1

      It's just the first 16-bits of a 32-bit float.

  • @duckdoom5
    @duckdoom5 3 месяца назад +2

    I'm hoping we also get support for specifying the precision bits ourselves (aka. fixed point floats). Would be very helpful for games and graphics, where this is common practice. And support unorm and snorm types to pass to the graphics buffers directly

  • @marka7970
    @marka7970 3 месяца назад +1

    🤯

  • @sinom
    @sinom 3 месяца назад +3

    It's funny that literally today I was trying to use this header but for whatever reason MSVC decided they don't want to implement it... Making this entirely optional was a mistake

    • @cppweekly
      @cppweekly  3 месяца назад +3

      :(

    • @GeorgeTsiros
      @GeorgeTsiros 3 месяца назад

      Well. MSVC. You know. Thankfully, vs2022 does offer LLVM, so, that's kinda nice.

  • @jamesburgess9101
    @jamesburgess9101 3 месяца назад

    Ha! 1:20 "[16 bit float] might matter to you if you work on small devices" are you calling our 50K cpu core renderfarms a "small device"? :-) FWIW the visual effects community has been using 16 bit floats for a couple of decades where it matters a lot.

    • @cppweekly
      @cppweekly  3 месяца назад

      I knew it was intended for better vectorization on GPU applications, but I did not know it was used for graphics rendering!

    • @jamesburgess9101
      @jamesburgess9101 3 месяца назад

      @@cppweekly well truth be told render is mostly a float32 thing, on-disk and compositing is where 16 comes in

  • @Key_Capz_
    @Key_Capz_ 3 месяца назад

    nice

  • @RishabhDeepSingh
    @RishabhDeepSingh 3 месяца назад +1

    First