Crust of Rust: Atomics and Memory Ordering

Поделиться
HTML-код
  • Опубликовано: 20 окт 2024

Комментарии • 134

  • @haystackdmilith
    @haystackdmilith 3 года назад +158

    "Don't use spinlocks! Don't implement spinlocks! Ok, let's implement a spinlock". Love it!

    • @sycration
      @sycration 3 года назад +1

      @Cooper Rene LOOL just use a private tracker, don't be a punk

  • @molozful
    @molozful 2 года назад +41

    It's incredible how coherent you can be in a live session, and on such a tricky topic. Kudos!

  • @KohuGaly
    @KohuGaly 3 года назад +113

    This video exposes one concept that you rarely notice in regular code. Computer code looks sequential, when written down as text, but in actuality it defines an oriented graph of operations. If two operations don't lie on the same path through the edges of the graph, their relative order is undefined.
    Multi-threaded memory access exposes this weird quirk, because it itself may lack inherent ordering (ie. allows for race conditions).
    Memory ordering introduces extra edges to the graph, to restrict possible orderings.

    • @andynicholson7944
      @andynicholson7944 3 года назад +12

      🤯 that’s a really clear explanation. I’d never thought of code in those terms. Thanks for the perspective shift!

    • @michalkupczyk7
      @michalkupczyk7 3 года назад +4

      You may want to mentally map Ordering to Transaction Isolation levels in SQL databases.

    • @carterthaxton
      @carterthaxton 3 года назад +6

      Another important ingredient is out-of-order execution by modern processors. Normally such CPUs and their caches provide a programmer-visible model that appears to execute sequentially. When they run on multiple threads, especially on symmetric multi-processors (e.g. a quad-core CPU), then these out-of-order execution models can expose far more interdependence than you might think could occur just due to "parallel execution". Ever since the Pentium Pro, for example, your CPU isn't even executing the raw instructions. It's converting them into micro-code, which is analyzed in real-time, much like a modern compiler, and reordered according to the directed graph of operations you describe. Then as many of those micro-code instructions as possible are executed simultaneously. Add to that the presence of multi-layer caches, shared between multiple processors, and the whole idea of instructions that execute one instruction at a time just completely breaks down.

    • @JordanBourdeau
      @JordanBourdeau 7 месяцев назад +1

      This is a great explanation, thanks tor sharing!

    • @Axman6
      @Axman6 3 месяца назад +3

      This is related to how Haskell's IO type is defined (in GHC) - it is a state monad which passes around a special value (RealWorld#) which the compiler treats as being different every time it sees it: data IO a = IO (RealWorld# -> (a, RealWorld#)) - this creates a strict ordering between IO actions, the compiler can't change them around because the RealWorld# token would no longer be passed between the actions in the same order. Other Haskell code is exactly what you've said, a graph of values, but it's one in which the order doesn't matter, because the values are always pure.
      (I'll note that "ordering of actions" didn't require the use of the m-word, that's what we call the ubiquitous pattern that IO fits nicely into)

  • @kennichdendenn
    @kennichdendenn 3 года назад +78

    I want to give feedback:
    This is great. It fits my difficulty level perfectly and is exactly what I need - as the 2018 guideline suggests.
    Im have some experience with rust and am a computer science student currently just before my bachelors degree, so I have some experience with programming and how a computer works on several levels. Heck, I wrote a small subset-of-c-compiler. But I i.e. never heard of Ordering before.
    Those topics normally have a super high barrier of entry because of all the previous knowledge required, only that that knowledge has the same barrier. Your videos help break that and to learn a lot about how computers work in general.
    Keep on with the great work. I wish all my lectures were even half as engaging, informative, accessible and interesting.

  • @longtomjr22
    @longtomjr22 3 года назад +17

    My wife made me watch the intro 7 times. Makes us laugh every time! (Now if only I could also get her as excited about coding again)

    • @aqua3418
      @aqua3418 2 года назад +1

      Just have to come at it from an angle that's applicable to her in some way, that way she can see the possibilities and how it benefits/helps her :D

  • @kanuos
    @kanuos Год назад +3

    Could not have asked for anything better than this! Thank you so much for the work you do for us!

  • @shashanksharma21
    @shashanksharma21 Год назад +1

    Thank you for explaining this in such a brilliant manner! This is the first time I have been able to understand this ordering issue. Thank you Jon!

  • @RobertWHurst
    @RobertWHurst Год назад +2

    Thank you for creating so many of these awesome videos Jon. I find them very beneficial personally, and always enjoy the adventures you take us on. Cheers 🍻

  • @nilshaberstroh2705
    @nilshaberstroh2705 3 года назад +3

    Another topic I might have avoided for much longer. But you manage to explain very complicated things (at least to me) in a very digestible way. Hat off 🎩. And the MESI protocol is indeed an interesting topic. Thanks for that hint as well.

  • @zamalek4079
    @zamalek4079 3 года назад +14

    Jeff Preshing has an incredible blog (and I think some videos) on this. He's a C developer, but as you said that doesn't matter.

  • @ericseppanen6579
    @ericseppanen6579 3 года назад +7

    One of the exciting aspects of fetch_update (or any CAS loop that generates the new value from the old) is that your closure may be executed more than once. If your closure has side-effects, it sometimes surprises people that they get the side-effect multiple times.
    A more common mistake is for the closure to read and write to a value that exists outside the closure (i.e. "if !initialized { initialized = true; new_value = 42 }"). If the first CAS attempt fails, then the second run of the closure generates an unexpected result because it inherits state from the first run of the closure. I've seen this bug multiple times in production code.

  • @amcsi
    @amcsi 3 года назад +3

    Thank you so much for diving so deep into all the memory orderings, and all of everything regarding atomics! I watched the entire video.
    I was going to ask why they say Rust gives you safe multithreading if there are so many issues with it apparently, but then at the very end you basically said that it's just atomics that are the problem; mutexes are what really give you the safety :)

    • @berylliosis5250
      @berylliosis5250 3 года назад

      The other big difference is that if you don't use Atomics or unsafe, you can't mess up. C/C++ will happily pass a T* to another thread and let you get all the problems atomics have, and more. In Rust, code that doesn't use unsafe or a preexisting synchronization primitive (which necessarily uses unsafe) can't even experience a race condition (afaik).

    • @brdrnda3805
      @brdrnda3805 Год назад

      @@berylliosis5250 Correction: ...can't even experience a -race condition- data race.
      From the Rustonomicon:
      "Safe Rust guarantees an absence of data races,..." "...However Rust does not prevent general race conditions."

  • @robert36902
    @robert36902 6 месяцев назад

    Thanks a lot for this video! I feel it's a bit of a low-key hardcore topic that is great to know, even though simple mutexes and atomic counters have been enough for me so far :)

  • @oerglwoergl
    @oerglwoergl 3 года назад +7

    Very interesting, thanks. The general rule seems to be (as with crypto): don't roll your own (if you can avoid it).

  • @elio8119
    @elio8119 3 года назад +1

    I think this is the best explanation of how to use atomics.

  • @Ergzay
    @Ergzay 3 года назад +5

    For anyone testing with themselves while following along: While testing with the first section with the "bad" mutex. It reliably generated completely random numbers every run when run with just "cargo run" instead of "cargo run --release", even without the yield. However I'm running on a Macbook Air with the M1 based ARM processor so that probably increases the randomness as ARM's memory model is a little more relaxed than Intel's I believe.
    Also, with everything on Relaxed, I still got reliable failures even after switching to the "correct" implementation using compare_exchange. The value was very close to being correct and passed most of the time however.

  • @MikeBondar
    @MikeBondar 3 года назад +1

    great video, thanks! this whole memory ordering thing is much clearer now

  • @BrazenNL
    @BrazenNL 2 года назад +1

    7:00 I couldn't live without Dark Reader!

  • @digama0
    @digama0 3 года назад +5

    Regarding your example of relaxed ordering, there is a variant that is even more mind bending:
    thread 1: { let r1 = y; x = r1; r1 }
    thread 2: { let r2 = x; if r2 == 42 { y = 42; } r2 }
    (All loads of x and y are relaxed.) The crazy thing is that it is _still_ allowed that both threads read 42. This seems impossible since there is a control dependency from the read of x to the write of y, but this can actually happen in practice if thread 2 speculatively executes y = 42, thread 1 picks that up and stores x = 42, and then thread 2 receives 42 from x and decides that the speculation was correct. In fact, relaxed loads are even worse than that, since in the following program
    thread 1: { let r1 = y; x = r1; r1 }
    thread 2: { let r2 = x; y = r2; r2 }
    it is _still_ allowed for r1 and r2 to be 42, which is now completely nuts since there aren't any 42's in the program. (Actually C11 has a clause saying there has to be a 42 somewhere in the program but you can just have a random 42 elsewhere to satisfy this constraint.) However this is an unrealistic execution, in that no hardware actually can produce 42 in this situation without literal time travel. (This is also known as the "out-of-thin-air" problem since 42 just appears by time travel shenanigans.)

    • @susejx1
      @susejx1 3 года назад

      On that reference for "in the air" of this one can there be a transformative way to stay in a better position where the experience isn't plugged into a two

  • @amirh6712
    @amirh6712 2 года назад +1

    Thank you so much for the great video Jon
    Just wanna point something out that I think you overlooked a little bit
    There is also another guarantee with Relaxed Ordering, and that is the operation is guaranteed to happen at some point in the future
    For example, without using any ordering, these two codes are considered equivalent from the compiler perspective (This optimization is known as Hoisting):
    while (!stoprequested){
    }
    if (!stoprequested){
    while(true){
    }
    }
    However if you set and read the stoprequested variable with relaxed ordering, it is guaranteed that the first loop will break eventually, meaning that the compiler will not translate it to the second form.

    • @jonhoo
      @jonhoo  2 года назад +1

      I actually looked through the spec for text along these lines, assuming that it _must_ be in there, and couldn't find it. Can you point me to where you see it written? The best I found was something along the lines of "the value is eventually visible" or something to that effect, but with no promises about what "eventually" means.

  • @jocketf3083
    @jocketf3083 3 года назад +3

    Thank You! I learned a lot from this video! However, I almost feel like parts of this API should require unsafe as an "I know what I'm doing" kind of thing. I was thinking that SeqCst could be the "safe" default and that methods where you can specify orderings would be unsafe. Perhaps that is taking it a bit far since this isn't really about memory safety, but it might go a little way towards fearless concurrency. Do you have any thoughts on that?

  • @casperes0912
    @casperes0912 Год назад

    For a master’s course I implemented exhaustive and optimal enumeration of value release-acquire traces using an algorithm called DPOR. Fun stuff.

  • @iilugs
    @iilugs Год назад

    I was about to lose hope on understanding atomics... then I saw there's a Jon Gjengset video about it 😍

  • @dmitrysim
    @dmitrysim 7 месяцев назад

    First of all it's a great video, Thank you.
    I though it might be a good idea to think on the problem in terms of cpu cores, rather than threads.
    Later x.load -> c1 denotes "x.load was scheduled to run on core 1".
    2nd problem (Ordering::SeqCst section):
    - (x.store -> c1, y.store -> c2) finish.
    *** At this point c1: {x: true, y: false} c2: {x: false, y: true}
    - (!x.load -> c1, !y.load -> c2), so both loops end;
    - (x.load -> c2, y.load -> c1)
    This way z == 0.
    1st problem (Ordering::Relaxed section):
    To achieve the case, we need at least "x.load finishes after y.store finishes", since y.store is the only way 42 can get into any core memory.
    So this only achievable with compiler shenanigans, in this case on 1 core:
    - y.store -> c1
    *** At this point c1:{y: 42}
    - y.load -> c1
    - x.store -> c1
    *** At this point c1:{x: 42}
    - x.load -> c1
    Also, loom docs docs.rs/loom/latest/loom/#relaxed-memory-ordering state that "B can run before A", not "D can run before C", meaning "compiler can shuffle commands even if the 2nd depends on result of the 1st"? - I hope it's just a typo.

  • @alliu6757
    @alliu6757 3 года назад +3

    Hi, Jon. Thanks for the amazing tutorial. Could you do some videos about tokio, crossbeam, and rayon?

  • @miriyalajeevankumar5449
    @miriyalajeevankumar5449 2 месяца назад

    Thanks a lot for your service.

  • @lettere9560
    @lettere9560 2 года назад

    The way I understand it is that load/store instruction only affects the local cache of a particular core. Ordering::Acquire is an analogue to recv and Ordering::Release is an analogue to send operation. However, there is some additional rule where if there is no incoming message in the bus, then Ordering::Acquire just read directly on its local cache.

  • @thomasfrans1185
    @thomasfrans1185 Год назад

    32:48 is probably the most sarcastic "Oh nice, it works so well" I've ever heard XD.

  • @nikosefthiasjs
    @nikosefthiasjs 3 года назад +2

    Hi thanks for the great video.
    What lsp plugin are you using it is really responsive

    • @jonhoo
      @jonhoo  3 года назад +4

      I'm using rust-analyzer through coc.nvim :)

  • @dinushkam2444
    @dinushkam2444 3 года назад

    Great video, learned alot. Thanks I might need to rewatch

  • @timvw01
    @timvw01 Год назад

    Learning a lot here. Still cant accurately predict the results when using ordering in certain cases, but this video explains the concepts very well. Perhaps some sketching of the concepts can help, as was done in the leftright/evmap vid.
    Thanks!

  • @johndowson1852
    @johndowson1852 3 года назад +18

    You should give Dark Reader extension a try. It gives a decent enough dark mode experience on pretty much any website.

    • @driedurchin
      @driedurchin 3 года назад

      Absolutely! Love that extension.

    • @mwcz5190
      @mwcz5190 3 года назад

      Great extension.

    • @jonhoo
      @jonhoo  3 года назад +22

      Ah, but you see, I don't actually _like_ dark mode for anything that has a lot of text on it like documentation. I *only* use dark mode for streaming because I was told by a bunch of viewers early on that their eyes hurt since they're watching the streams in a dark room ;)

    • @driedurchin
      @driedurchin 3 года назад +8

      @@jonhoo Blasphemer! Dark mode for life 🕶️

    • @AbhinavKulshreshtha
      @AbhinavKulshreshtha 3 года назад +3

      @@jonhoo
      Well I should thank you for being so considerate for us. I do get flash blinded when everyone in youtube video open a very bright website. Maybe its because I have become too depend on dark mode.

  • @Lexikon00
    @Lexikon00 3 года назад +2

    So, if I increment a counter in different threads and I just care to fetch the result after all threads have resolved, I can just use relaxed because there is no read just writes and therefore the ordering is not relevant? In regards that we take care that stuff happens in one execution.

    • @OMGclueless
      @OMGclueless 3 года назад

      Yes, this is mostly correct. Your statement that "there is no read just writes" isn't correct, updating a counter is both a read *and* a write. But when updating a counter you probably don't care what value your thread reads, so it's OK to use relaxed.
      More precisely, if you use `fetch_add` to guarantee that the updates to the counter are atomic, and you call `.join()` on all the threads to provide the Happens-Before relationship that guarantees that the threads' updates to the counter are visible to you before you read the final value, then you can do all the updates with `Ordering::Relaxed` and things will work.

  • @jeffg4686
    @jeffg4686 3 года назад

    Hi Jon, I have a small question regarding std::cell::Cell and std::rc::RefCell if you happen to have a minute. So, I get the concept of being able to change an inner value on an immutable reference, and have seen examples where a clone() implementation that takes a shared reference to self, but also needs to update some other member of self, and hence the issue since it's only shared reference, and so the work around is to use these cells. I was wondering do you tend to see this usage often outside of things like clone() implementations? I guess it's good to know these cells are there and what they do, but when thinking about when I'd ever use it, I kinda see it as quite rare, but good to know exists -- is that your take, or are these actually used all over the place in typical Rust code. thanks for any thoughts

    • @jonhoo
      @jonhoo  3 года назад +1

      I most commonly see in use-cases where you need mutable access to multiple subsets of a collection that are determined at runtime. This comes up frequently in graphs, but that's not the only place. In those cases, you have multiple & to the collection, and then take out &mut through something like RefCell as you, say, traverse nodes in topological order.

  • @droidgod3
    @droidgod3 3 года назад

    Is it possible to get the code for this episode as a gist or git repo?
    Sometimes it is hard to follow and I have to rewind all the way cause i forgot the context, when switching between examples.

  • @adamzimmerman1
    @adamzimmerman1 3 года назад +1

    Gonna leave a plug for "Designing Data-Intensive Applications" by Martin Kleppmann. It talks about a lot of the same things in the context of distributed systems and databases. Specifically in the context of eventual consistency vs. other read/write ordering guarantees, a lot of the same thinking applies. It's a great book, especially if you're working in with those types of systems.

    • @jankarsten4593
      @jankarsten4593 2 года назад

      Honestly one of the best books I've read

  • @pictureus
    @pictureus Год назад

    When you say in the beginning that one thread may never get the update.
    If you have memory X = 1
    Thread A that writes 2 to X in a loop.
    Thread B that reads X in a loop.
    Wouldn't Thread B eventually get the value 2 but not in the same step as you later mention?

  • @xinhao6245
    @xinhao6245 3 года назад

    great video to explain the details, thanks

  • @mindstorm3854
    @mindstorm3854 3 года назад +3

    omg I will need to see this video 5 times to understand half

  • @marlls1989
    @marlls1989 3 года назад

    On the counter example, wouldn’t relaxed allow the counter to be updated to the same value twice? Like, both load a previous value and both add 1 to the same old value.

  • @shuangzhu907
    @shuangzhu907 Месяц назад

    Thanks the great tutorial!

  • @fbytgeek
    @fbytgeek 2 года назад

    @jon Gjengset, is it possible to do a stream on Rayon or parallelism in Rust?

  • @ahuggingsam
    @ahuggingsam Год назад

    so I know I'm late to the party but I do have a question about the Orderding::SeqCst example. Does this amount of synchronization mean we end up regressing to almost single threaded performance again? Or can the compiler/CPU ensure this ordering while still maintaining concurrency (with obviously at least some overhead for the synchronization) ?

    • @LtdJorge
      @LtdJorge 3 дня назад

      A bit late, but no, SeqCst should still be much faster than single threading, provided the algorithms are implemented appropriately. To synchronize, the different cores have channels, an when an atomic instruction is used that requires synchronization, the thread uses a protocol over those channels to notify the relevant cores. Read up on the MOESI protocol. This stalls the participant cores a bit, but is still much faster than doing everything on the same thread, as the pauses are negligible compared to long running multi-threaded tasks.

  • @DHorse
    @DHorse Год назад

    Great topic!

  • @hrmny_
    @hrmny_ 3 года назад +1

    I'm not sure whether I understand the example with `SeqCst`, do atomics when they change value not still have the same memory address?
    Can the CPU just change that around?
    And wouldn't it be horrible if it does, but the load can still refer to the old location?
    How can it at that point even guarantee that it ever gets the new value and your references to the atomic don't become stale immediately
    How can `load` get an old value after it has been changed in memory 🤔

    • @aGuyFromSomewhere
      @aGuyFromSomewhere 3 года назад +5

      I think this may, for example, come down to caching. When Threads run on different cores, both cores may have a given value copied in their caches. Both cores may modify the value in their cache, without the other core seeing the modification. If the ordering of the modifications does not specify a clear “happened before” relation between them, then the cores may chose not to invalidate their caches before the modifications, which would be costly. This is probably also why stricter orderings (than relaxed) are considered to incur a performance cost. The cores cannot optimize as freely (with caches and out-of-order execution and the likes) when an ordering is specified. So if performance is a concern, you always want to find the “most relaxed” ordering that makes the program still work correctly.
      Anyway, caching is just my guess, there might also be other reasons for Threads to see different values of the same memory location.

  • @TheClonerx
    @TheClonerx 3 года назад

    Even for C++ this is really useful

  • @BryanChance
    @BryanChance 3 года назад

    This is new to me. My mind blew up after 1:58:00. LOL

  • @arpandhatt6011
    @arpandhatt6011 2 года назад

    If we were to acquire and release atomic variables inside of a loop, can memory ordering allow the CPU to wrap operations around the loop? For example, if we did a Release operation at the end of the loop, the contents of the loop will now be visible to other threads. If the CPU correctly predicts that the loop will run again, it will have already begun processing instructions for the next iteration of the loop. Is it guaranteed that it won’t move these operations ahead of the release synchronization from the previous iteration of the loop?

    • @jonhoo
      @jonhoo  2 года назад +1

      Ah, no, the memory model generally guarantees in-order semantics within a given thread. Reordering is only with respect to the actions of other threads.

  • @manuelaraujo2571
    @manuelaraujo2571 Год назад

    I got confused by the argument for why z=0 is impossible in the SeqCst example. I think the argument should be that if t1 does not add, it must be that it sees y = false after it sees x = true, which establishes that the x load happens before the y load, so t2 must see x = true after it sees y = true and it adds. Similarly, if t2 does not add then this means the y load happens first and so t1 adds.

  • @Baptistetriple0
    @Baptistetriple0 2 года назад

    Ok I have a question on the possible reordering of the programm by the CPU when the operations are independant, if I print something in my function, but the printed value don't depend on any operation of the function, like if I just put "println!("test");", is the compiler allowed to move that print in the function? or because IO is considered having side effect it can't assume there is no link with the other operation ?

    • @LtdJorge
      @LtdJorge 3 дня назад

      The compiler won’t reorder entire functions, what it will reorder is memory loads and stores. Same for the CPU. Atomics is all about the super low level load and store primitives.

  • @pm71241
    @pm71241 3 года назад

    Hmm... this loom thing reminds me of Kyle Kingsbury's "jepsen" and "Knossos" software to check distributed databases for consistency issues.

  • @jeffg4686
    @jeffg4686 3 года назад

    A little confused on Box::leak. So leak gives you back a reference to the actual value and consumes owned binding, but confused as to why one would want to over just allocating Box::new alone. What benefit does that yield? thanks

    • @jonhoo
      @jonhoo  3 года назад

      Box::leak gives you a &'static, which you can then give out many copies of, including to other threads. If you have a Box, any reference to the Box is tied to the lifetime of the Box (i.e., it isn't 'static), so you can't send it to other threads and such!

    • @jeffg4686
      @jeffg4686 3 года назад

      @@jonhoo - gotcha thanks much. Actually, kinda wondering what makes this thread safe? 'static lifetime doesn't itself imply thread safe does it?

    • @SimonBuchanNz
      @SimonBuchanNz 3 года назад +1

      @@jeffg4686 the short version is that it's a non-mut reference, eg a "shared" reference, and the type is an atomic which implements Sync and thus is declaring itself safe to use from multiple threads at the same time.

  • @correabuscar
    @correabuscar 4 месяца назад

    1:21:40, that Relaxed (on fail), doesn't it mean it uses that memory ordering to load the value? even before the compare happens? or is it internally split then into two operations: a load for the compare&exchange and if the compare failed, then a load(again!!) for the actual value; and these two are grouped into an atomic operation; but doesn't seem to make sense, so doesn't it load it with the failure ordering (aka Relaxed in this case) then compare that value and if it's same as expected then it does the store with the success ordering(here Acquire), and these two load&(compare&)store are themselves internally atomic ?! otherwise it seems kinda weird that it would load the same value twice, once for success and once for fail; so then if what i expect is correct then the ordering args would be Release(for success) and Acquire(for failure), because it would load it with Acquire and store it with Release, or well, maybe storing it with Release isn't needed? unclear on this part. But i'm only talking about the compare_exchange_weak here, if it wasn't clear, not the last store of unlocked.

    • @correabuscar
      @correabuscar 4 месяца назад

      the doc says this:
      "compare_exchange_weak takes two Ordering arguments to describe the memory
      ordering of this operation. success describes the required ordering for the
      read-modify-write operation that takes place if the comparison with current succeeds.
      failure describes the required ordering for the load operation that takes place when
      the comparison fails. Using Acquire as success ordering makes the store part
      of this operation Relaxed, and using Release makes the successful load
      Relaxed. The failure ordering can only be SeqCst, Acquire or Relaxed."

  • @error200http
    @error200http 3 года назад

    1:14:47 C++ docs are enlightening

  • @jacklong2182
    @jacklong2182 3 года назад

    Do you use vim nerdtree plugin? how do you navigate between different files

    • @jonhoo
      @jonhoo  3 года назад +1

      Nope. I just use fzf.vim :) github.com/jonhoo/configs/blob/26409d160dff9a667d3fcb854bf83b37b544b154/editor/.config/nvim/init.vim#L28

  • @daniellambert6207
    @daniellambert6207 3 года назад

    2:39:17 always an educational experience :)

  • @austecon6818
    @austecon6818 2 года назад +1

    I think my take home message from this has been: If I do multi-threading in Rust. Just use a damned Mutex.

    • @theroboman727
      @theroboman727 2 года назад

      At least learn channels as well. They fit some problems much better. The book ha sa subchapter on them.

  • @Ergzay
    @Ergzay 3 года назад +4

    I have a problem with the part at 1:53:05, you say "there's no requirement that x is true even though tx has run". That's simply false. An Ordering::Acquire will see all Ordering::Release of the same variable, if it's run. Did you mean to say "even though ty has run"?

    • @jonhoo
      @jonhoo  3 года назад +5

      The C specification is actually extremely vague on this front. From the C specification section 5.1.2.4: "The value observed by a load of an atomic object depends on the “happens before” relation, which in turn depends on the values observed by loads of atomic objects. The intended reading is that there must exist an association of atomic loads with modifications they observe that, together with suitably chosen modification orders and the “happens before” relation derived as described above, satisfy the resulting constraints as imposed here." Section 7.17.3.1 does say that "[i]mplementations should make atomic stores visible to atomic loads within a reasonable amount of time", but that's really all. Ultimately, by my reading at least, if two threads A and B have no relationship except that A writes x (with Release) and B reads x (with Acquire), it doesn't _seem_ like there is a requirement that B _will_ see A's write to x. There _is_ a requirement that _if_ B sees A's write to x, then B read of x "happens after" A's write to x. Now, _in practice_ you won't generally observe this oddity since chances are that there are also other "happens before" relationships that
      The reason I conclude that this is the right reading of the standard, as opposed to "any acquire load of x that occurs after (by wall-clock time) a release store of x _will_ see the x", is that otherwise the example given in the reference (en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering) would not require SeqCst - we'd be back to "either tx or ty happened first" even for Acquire/Release ordering, in which case an outcome of 0 is impossible. But the reference says that 0 _is_ possible under Acquire/Release, which must mean that it's possible for tx to run, but its store only being visible to one thread but not another.

    • @Ergzay
      @Ergzay 3 года назад +1

      @@jonhoo Thanks for the response. I'm not quite convinced, but I'll think on this some more. I'm wondering if this is maybe a difference between C11 and C++11 given that you're mixing the two languages here.

    • @jonhoo
      @jonhoo  3 года назад +1

      @@Ergzay They both have the same memory ordering model as far as I am aware, I just had the C11 spec more readily available :)

    • @Ergzay
      @Ergzay 3 года назад

      @@jonhoo As a side note, if you have sway in the loom project, if you could talk them into increasing MAX_THREADS or making it configurable, it's currently a hard coded const value to the number "4" which makes it impossible to use loom with this example as there's 5 threads including the main thread. Using loom with this example and it just instantly panics. It makes it hard for me to understand how it can be used for anything other than trivial examples given it's so limited.

    • @jonhoo
      @jonhoo  3 года назад +1

      @@Ergzay Yeah, it's a known challenge. Part of the problem is that loom needs to track a lot of per-thread state, and it does so with fixed size arrays to avoid allocating (which would kill performance). Bumping the number of threads would thus also bump the size of the entire working state fairly drastically. Though I wonder if maybe it'd be possible to use the new const generics to make it configurable!
      You can actually model this case in loom just by doing the store to x in the main thread. In general I've found it to be rare that you need more than four threads to replicate any given race condition at the level of atomics.

  • @Dygear
    @Dygear 2 года назад

    I wonder if the data race for the locking of the mutex is more likely on LITTLE.big architectures because of the different speed of the P and E cores.

    • @Dygear
      @Dygear 2 года назад

      In ordering::relaxed section, y can store into x's 42 value because they happen at the same time? The let r1 is step a1, the let r2 is step b1 and happen at the same time as they are in different threads. the x.store is step a2, and the y.store is step b2. Because it's relaxed the compiler is free and the the CPU is free to execute these functions in any order so a1 can happen at the same time b2 is happening meaning we see y's stored value cross over into x.

    • @Dygear
      @Dygear 2 года назад

      Getting a Raspberry Pi 4 to show the Relaxed Ordering failing could be interesting.
      With how powerful Ordering is over how the compiler will execute the statements it sees, it would almost be helpful to have a Syntax lowlighting in the background of a different color to show were the relationships start and end.

    • @Dygear
      @Dygear 2 года назад

      So atomics are single assembly instructions, an instruction is only consider atomic when no other application, thread, process, core will have a chance to interject itself into your program flow because it's a single instruction. Electrons are in flight changing the value, nothing else can happen until that's done.

    • @Dygear
      @Dygear 2 года назад

      1:30:55 - Line 864: What on earth is `x @ Ok(_)`?! I've never seen that before.

  • @yuan.pingchen3056
    @yuan.pingchen3056 Год назад

    is the rust language capable to write single process single thread non-atomic data-type console program?

    • @jonhoo
      @jonhoo  Год назад +1

      yes

    • @yuan.pingchen3056
      @yuan.pingchen3056 Год назад

      @@jonhoo this is a good news, but, it seems like it doesn't have the capability to off-loading workloads to GPU? isn't it ? (Heterogeneous computing)

  • @Blure
    @Blure 3 года назад +14

    NOM NOM NOM NOM

  • @timvw01
    @timvw01 2 года назад

    This is great

  • @xwtek3505
    @xwtek3505 4 месяца назад

    2:36:45 uh, Rust has .as_ptr(), which is surprisingly safe, as dereferencing it is unsafe anyway.

  • @saseoham
    @saseoham 3 года назад

    Thank you!!!!!

  • @yailkalaf8542
    @yailkalaf8542 5 месяцев назад

    Thanks

  • @pkunk
    @pkunk 3 года назад +3

    Explaining atomics without Indiana Jones reference.

  • @garlicxu4768
    @garlicxu4768 3 года назад

    Nice job, btw why not just get a dark mode browser plug-in 😂

  • @keithmiller8510
    @keithmiller8510 Год назад

    I knew the difference but 16:00 really gave me an ah-ha moment on usage.

  • @focker0000
    @focker0000 2 месяца назад

    1:05 the magic happens in compiler!!!

  • @mohsanabbas6835
    @mohsanabbas6835 3 года назад

    Yum num yum

  • @xderfszvfthgycu978
    @xderfszvfthgycu978 3 года назад

    underrated

  • @uvarovrs40
    @uvarovrs40 3 года назад

    Can machine instructions reordering make function below to panic by making the line B to be executed before the line A?
    fn foo() -> Result {
    ...
    if my_vec.len() < 2 { return Err( MyErr::TooShortErrMsg ); } // line A
    let last_value = my_vec.last().unwrap(); // line B, last() function returns None if my_vec is empty
    ...
    Ok(())
    }

    • @jonhoo
      @jonhoo  3 года назад +1

      No - the compiler must respect "program order" as far as any visible side-effects. The reason it can seemingly reorder operations when multiple threads are involved is that it's allowed to choose semi-arbitrarily which other threads' writes are visible at what time.

  • @어제태어남
    @어제태어남 2 месяца назад

    19:36 memo

  • @deadskrillaskrit2078
    @deadskrillaskrit2078 3 года назад

    Bruh ya selling me on coding.

  • @user-ov5nd1fb7s
    @user-ov5nd1fb7s Год назад

    ruclips.net/video/rMGWeSjctlY/видео.html
    Instruction level parallelism is what you mean, i think.

  • @user-ov5nd1fb7s
    @user-ov5nd1fb7s Год назад

    ruclips.net/video/rMGWeSjctlY/видео.html
    I think you wanna say data dependencies.

  • @critamine
    @critamine 2 месяца назад

    wtf this guy is a vim demon 😭

  • @miketurner3461
    @miketurner3461 2 года назад

    Stop trying to make fetch happen

  • @mmmvvkk
    @mmmvvkk 2 года назад

    the low freq. feedback of your keyboard resonating through your microphone when you're typing, thats really annoying :>