A couple of comments: 1 Volatile variables have atomic semantics in MSVC on Windows. 2 The slide regarding compiler barriers includes a non-empty asm block with an mfence instruction. This serves as both a compiler and a memory barrier. However, an empty asm block is sufficient to be a compiler barrier (but not a memory barrier). atomic_signal_fence, as far as I know, is a standard way to express a compiler barrier.
54:39 x86 inc is not atomic without a lock prefix. Code works because it is protected by the spin lock. Still a great usage example of acquire/release. Thanks for the great survey on concurrency support in C++.
Only use I have found for volatile was for accessing hardware registers. You need to tell the compiler that every read/write must really happen but you also have to make sure the cpu mmu will have flagged those memory areas as noncachable so it will actually do what you want.
You need to keep in mind. Volatile before atomic instructions were the ONLY synchronization available in C and C++, therefore it works as such, even though the standard doesnt require it.
What is it with C++ programmers thinking Volatile is used with atomics and threads? Every talk where Volatile is mentioned there is always emphasis on "stop using it with atomics!". Coming from C, that is wild to me, haha
Re: Pop Quiz what is the memory model for CPU? if its x86 then its not possible because x86 will not reorder stores after stores. y3 should be visible before y4.
20:48 here it depends on the data-types and how they are accessed. Cause either it is a data-race and thus undefined behaviour, or all variables are atomic and thus not possible, or all variables are atomic and memory_order_relaxed is used on all operations - that is the only legal way for this to happen. std::atomic is a big mistake in the way it is formulated - and deprecating volatile is also the exact opposite of how it should be done. Volatile has a very specific meaning and it got nothing to do with "don't reorder this" or "don't optimise this" or "i need synchronisation" - and the examples for misuse of a keyword mostly boil down to people not knowing what it actually means, often due to people that know better telling them false things (like you said at 25:28). And for atomic operations it is the same thing: in no way do they mean that those operations can not be re-ordered, or that they can not be removed entirely - the only thing it means is that an operation in that memory must behave as if uninterrupted. What really would have been needed would have been 2 keywords and 1 functionality: something for atomicity - "atomic" something for dataraces - "synchronised" a simple memory-barrier - a way to specify that this step acts as a specific memory-barrier either for all or just specific memory. I have needed operations to be atomic, synchronised and volatile but rarely if ever all 3 at once. Most of the time it really was just either volatile or sychronised.
He did not show header file. Did he include pthread.h file? Since pthread.h file fully supported by Linux OS platform windows has it own mechanism for multi threading.
You're right, but what architecture to optimizations like pipelining and superscalar processing apply to, then? 🤔 von Neumann is simply fetch-decode-execute, it doesn't really say how to do it, right?
It seems like C++ threads are just painfully trying to evolve pthreads into this century. Every example cited would just be easier, cleaner and more portable if implemented with good old OpenMP. I can't see a compelling argument to use this stuff.
This talk has been featured in the last issue of ⭐Tech Talks Weekly newsletter. Congrats Alex! 👏
What connects the talk to its title? It's mainly about concurrency primitives, but little about the memory model behind it.
2:03 the talk starts here
A couple of comments:
1 Volatile variables have atomic semantics in MSVC on Windows.
2 The slide regarding compiler barriers includes a non-empty asm block with an mfence instruction. This serves as both a compiler and a memory barrier. However, an empty asm block is sufficient to be a compiler barrier (but not a memory barrier). atomic_signal_fence, as far as I know, is a standard way to express a compiler barrier.
34:34 note that atomic operations are non-blocking, so in this slide Thread 2 probably needs to check that r1==1 before assigning “r2 = a;”
Actually I saw such mistakes in lock-free data structures implementation, it's common mistake.
54:39 x86 inc is not atomic without a lock prefix. Code works because it is protected by the spin lock. Still a great usage example of acquire/release. Thanks for the great survey on concurrency support in C++.
Only use I have found for volatile was for accessing hardware registers. You need to tell the compiler that every read/write must really happen but you also have to make sure the cpu mmu will have flagged those memory areas as noncachable so it will actually do what you want.
Dear Alex & Co! Thank you for sharing on your knowledge. Please, Quora: Is it important to understand how computer memory works?
Yes it is really important
You need to keep in mind. Volatile before atomic instructions were the ONLY synchronization available in C and C++, therefore it works as such, even though the standard doesnt require it.
24:26 JF Bastien’s talk from cppcon 2019: ruclips.net/video/KJW_DLaVXIY/видео.html
What is it with C++ programmers thinking Volatile is used with atomics and threads? Every talk where Volatile is mentioned there is always emphasis on "stop using it with atomics!".
Coming from C, that is wild to me, haha
Prior to C++11 volatiles were MS specific atomics in MSVC++.
38:45 Daniel Anderson’s talk from cppcon 2023: ruclips.net/video/lNPZV9Iqo3U/видео.html
38:36 Timur Doumler’s talk from cppcon 2022: ruclips.net/video/gTpubZ8N0no/видео.html
Is it just me or ironically bonus example can be correctly fixed with volatile on shared_val?
Re: Pop Quiz
what is the memory model for CPU? if its x86 then its not possible because x86 will not reorder stores after stores. y3 should be visible before y4.
20:48 here it depends on the data-types and how they are accessed. Cause either it is a data-race and thus undefined behaviour, or all variables are atomic and thus not possible, or all variables are atomic and memory_order_relaxed is used on all operations - that is the only legal way for this to happen.
std::atomic is a big mistake in the way it is formulated - and deprecating volatile is also the exact opposite of how it should be done. Volatile has a very specific meaning and it got nothing to do with "don't reorder this" or "don't optimise this" or "i need synchronisation" - and the examples for misuse of a keyword mostly boil down to people not knowing what it actually means, often due to people that know better telling them false things (like you said at 25:28). And for atomic operations it is the same thing: in no way do they mean that those operations can not be re-ordered, or that they can not be removed entirely - the only thing it means is that an operation in that memory must behave as if uninterrupted.
What really would have been needed would have been 2 keywords and 1 functionality:
something for atomicity - "atomic"
something for dataraces - "synchronised"
a simple memory-barrier - a way to specify that this step acts as a specific memory-barrier either for all or just specific memory.
I have needed operations to be atomic, synchronised and volatile but rarely if ever all 3 at once. Most of the time it really was just either volatile or sychronised.
He did not show header file. Did he include pthread.h file? Since pthread.h file fully supported by Linux OS platform windows has it own mechanism for multi threading.
Where in the talk would this matter since he uses std::thread?
No lib pthread here, its C++ standard primitives. All compliers should be able to translate it to proper (for OS/hardware) binaries.
The bonus case will work only for two threads
I guess you could skip the first 15 minutes but after that it goes quite fast.
Pipelining and von Neumann architecture are two different things.
You're right, but what architecture to optimizations like pipelining and superscalar processing apply to, then? 🤔
von Neumann is simply fetch-decode-execute, it doesn't really say how to do it, right?
the int i{} triggers me, why not just write int i = 0 like a normal person
Narrowing conversion checking - sure, it's not applicable in this particular case, but I've found that it helps. :)
Direct initialisation also helps in avoiding narrowing conversions
Because c++ coders today like to overcomplicate everything.
Because later `int` can be changed to `T` template parameter.
int i{} is one of the reasons i stopped learning this language
It seems like C++ threads are just painfully trying to evolve pthreads into this century. Every example cited would just be easier, cleaner and more portable if implemented with good old OpenMP. I can't see a compelling argument to use this stuff.