I think I learned about unions from some book when I decided to learn C. Two keywords that are probably even more obscure are "register" and "volatile".
@HyperWin Just to add in for the conversation, that keyword tells the compiler not to change any of the code that is written, as that code may be used asynchronously, is changed by the hardware itself, or the software, such as a signal handler, changes the value itself. I used to use it for threads, because some of the code needed to work as intended and not moved or modified by the compiler for optimization purposes that will create undefined behavior.
unions are also very useful for dealing with communication packets, where you have a byte array that represents your entire packet, along with each item present within it. This way you can access the entire package (to send to other parts of your program), as well as each separate item
@@TheCarmacon I don't know if I understand your last point about consistency... consistency in communication can be achieved in other ways such as validating the package in N ways before reaching the union... Did you mean anything after that? Like any conversion problem? I would love to know more about it
@@0xDEAD_InsideLook at 3:57 in this video. In his code he's using the union to allow the hardware register to be accessed as different data types, but you use the same strategy when decoding communication protocols. You usually have an enum to represent the type of the payload, and you know which of the union'd structs to access based on the message type. It's not a very good way to do this, though, because compiler and/or endianness differences between processors can lead to the same C code parsing the data differently. It's convenient to access data this way, but you have to rely on compiler-specific features to make it reliable. And like with bitfields the compiler may be creating a lot of code behind the scenes, so you aren't really saving anything over just writing your own robust parsing code.
They say big corporations employ people to go into their repos and replace all C unions in commits with other corporate-approved solutions. They're known as "union busters".
The data conversion examples are cool, but it's important to note that they depend on the byte order ("endianness") and memory size of int, which can both change depending on what platform you're on.
If you end up depending on byte order, without using a memcpy family function, it probably means you have violated the type aliasing rules and compiler can miscompile your code.
The last thing is actually a very old concept called discriminated unions. Primarily used in functional languages, it was buried for years during OOP languages golden age, where it is effectively was substituted by inheritance. More modern languages are bringing it back though, in light of recent tendencies where inheritance is considered a bad thing. Like Rust has native support for discriminated unions via its enums, TypeScript has them too etc.
I think there is no connection to a bad OOP, obviously because even non-OOP languages mostly support basic features like dinamic dispatch through interfaces and type erasure or even allow you to make manual inheritance with pointers. d. unions are just way better sometimes. in Rust ranges Option drops hasnext() function to get next element of a collection
Sum types like you describe allow polymorphism, just like inheritance and interfaces. The problem, or maybe benefit depending on the circumstance, is that it is closed-set polymorphism. All possible concrete types must be known and specified at compile-time. You can't simply create a new type and then conform to an interface - you must go back and modify the original sum type. This can be cumbersome. Consider Rust, modifying an enum ALSO requires modifying ALL match expressions associated with it. The code change is much larger, and you *might* have to modify stuff you don't want to or aren't allowed to.
I have been programming for over 20 years and yet while I knew about unions I never used them. Just goes to show that an old dog can learn new tricks. I love how you really explain the low level side of things. Keep up the great work!
@ThePC007 I never really understood them and equated them with a struct so just used structs or more commonly classes as most of my programming is in C++ (Qt/C++ to be exact)
When you are doing any kind of protocol, you basically want to use unions. You basially switch case on the opcode ie. first byte and then interpret the incoming frame accordingly. If you don't want to use unions for some reason, you can always do explicit type casting, which is not great.
@@VersDarkmoor In my experience we just use explicit write32_be/read32_le and friends to read out protocol fields. Casting unions/structs onto serialized data is a recipe for disaster once you build for a different microcontroller CPU.
I used it while making my gameboy emulator, it helped a lot for mapping the CPU registers easily and in a lean way. The data inside some of those registers can be 8 bits (high or low part of a 16bit data, the A, B, C, D, E, H, L registers) or 16 bits (when combining two 8bit registers, the AF, BC, DE, HL), it's a neat feature.
@@giorgionegro5750 Oh, it is always compileable. But the result will be buggy, if you e.g. push BC onto the stack and then pop B and C separately. There is no standard header known to me to generate different code, except the one for the Linux kernel. But one can check at startup if the assumptions were correct. Like union { long a; char b[4]} u; u.a = 0x12345678; switch (u.b[2]) { case 0x34: // little endian; break; case 0x56: // big endian break; default: // weird, PDP-11 had 0x12 here }
Converting between types (such as the IP example) is called 'type punning'. Historically it has been poorly defined in the C standards, but does work in gcc C and C++ via a documented extension. The draft C18 standard does clarify the situation, and explicitly allows it. Note: You need to be careful with endianness - the uint32 representation of the IP address would be different based on endianness, so there are portability considerations.
I do wonder if it is allowed in this particular case, where strcpy() is used to write characters to some of the elements of Onion.str. If I understand right, C does not allow you to read a 4-byte union member after you write to a smaller union member such as a char, and this seems very similar to that situation.
I use unions in embedded programming as you described. An other trick for mapping a register to a variable is bit fields where you can make members of the union take a defined number of bits.
@EMLtheViewer, on a microprocessor, you'll often need to configure peripherals like a UART or timers. To do so you often need to set a number of bits in a register correctly, there are a number of ways of doing this, non of them are wrong but some are more difficult to read or maintain than others. For example, UARTConfig = 0x74; Although this will work, anyone reading or maintaining this code would need to refer to the processor data sheet to work out what this does. union { uint8_t Enable:1; uint8_t Parity:2; // None, even, odd uint8_t Prescale:5; } UARTconfig; UARTconfig.Enable = 1; UARTconfig.Parity = Even; // assuming you've configured an enum or some defines to do this. UARTconfig.Prescale = CalcUARTPrrscale(19200); Then you can set the register to the value of this union. It is easier to read and maintain. Bear in mind that it is up to the compiler author to decide what order the bit field is populated, therefore the code is not portable.
I use unions all the time. They are really useful. I use it when writing a Lexer/Parser. (Though I use it other times too, it's just where I use it the most) e.g. typedef struct { uint32_t line; uint32_t col; enum {IDENT, NUMBER, STRING} kind; union { struct { char* data; uint32_t len; } string; uint64_t number; } } Token;
I use unions to pack 8 byte CAN messages all the time. Its also very useful when using bitfields as well, instead of checking eight or sixteen individual bit flags, you just check to see if the entire thing is or is not zero.
@@xhivo97 yeah your code might not be portable, but in all my applications so far it didn't need to be. It could break if one system had a different endianess.
@@xhivo97 The bit-ordering is what's undefined here. AFAIK you can't even rely on tests for endiannes, so you need to test bitfields on a specific compiler/processor if you want to support it. As @feeditehh said in practice it works fine most everywhere, but that next microprocessor that comes out might just find it more efficient to do things in the opposite order.
Unions are great! It's extremely useful to be able to interpret a single piece of data as different types, structures or even arrays. Say you have a 32 bit pixel representing RGBA channels, sometimes you may want to access individual channels with their own unique names as you would in a struct, sometimes as raw byte arrays and maybe sometimes you want to simply assign a 32 bit value to the whole pixel.
Unions are a nice thing to have in a barebones language like C, where you are allowed to reassign the type of a variable by raw pointing, a union is just a fast and concise way to do basically the same thing.
I do embedded programming and use unions quite a bit. I think they are very useful, but they have their problems. For instance, if your code is targeting two different platforms with different endianness, then multi-byte unions will give you a very bad day!
I've used unions for some embedded stuff but most of the time I just end up using a bunch of bitwise operations instead. Never thought about using them for polymorphism though, that is actually pretty cool!
I discovered unions through cppreference And when I was writing my own json parser as an exercise, unions were the core of library's design They are not used really often, but sometimes they are irreplaceable :) Edit: and they are what makes C more functional of a language than python and many other popular ones. Because unions and structs are basically algebraic data types
Learned about unions when I was teaching myself C back in the early 80s... have used them a lot over the years... very common in systems and embedded programming, data conversion, and the like.
I learned unions quite early in my C language learning process, I find them extremely useful for all the cases you stated here and more. For making VM's for in development hardware and domain specific languages they are a godsend, also comes in handy in game engine programming, hell my yet to be uploaded codebase for a dead simple and easy to use and maintain forth with readable code and a focus on being embeddable in applications as a lua replacement uses that. Speaking of which, I got a video suggestion: Forth! It has seen its fair share of use in embedded systems.
Prior to database management systems, data was written in records composed of fixed-length fields. Unions were used to reinterpret the layout of those records, where the type was indicated at the start of the record to simulate a tagged union similar to your last example, and to provide ways to access elements within a field, e.g. 8 chars for the whole date unioned with 2/2/4 chars for month/day/year. Storing data like this is why the strn* functions exist in the standard library; they weren't intended to be "safe" versions of the non 'n' variants as people started suggesting in the 90s. I don't know if it is useful to learn how computing was done prior to the 80s as a lot of it isn't relevant today unless you find yourself interacting with COBOL but it is helpful if you want to know where some of this stuff comes from.
@ 2:00 - this isn't valid in C++, and is in fact undefined behavior due to the object lifetime rules: Only one member of a union may be active at any time. What you're demonstrating here is type punning, and is considered a 'wrong" use of unions. Instead, use memcpy() (which the compiler will helpfully reduce to an implicit union).
I love it because it can make complicated things more easier. I used it as a messaging protocol, where the shared object is a header, with a type, then the buffer after that depends on the header along with crc
I've used unions to make my C++ code more readable. I had a Vector3 class that I would use to represent rgb color values, and xyz coordinates. Instead of having two Vector3 types, or storing values twice, I used unions for each of the floats.
This right here. Great in graphics/game programming when trying to cannibalize memory (especially to have structs/data fit on a single cache line (64-bytes, generally)).
Every book on C programming I've ever used had had a section teaching about "structures and unions." Now, I've rarely ever used unions, but I appreciate how it can give your program an alternate view of your data. I can think of several interesting ways to use this feature, but very few practical ones.
Hey LLL, can you do a video on how SIMD works under the hood? Because it's relatively new, not many assembly textbooks cover it and how to write programs to take advantage of it.
Under the hood, a CPU has more than just 1 'adder', it has several ALUs. SIMD is aligning those ALUs to do the same thing (add) at once (or double pump). The cost is generally power, and some developer setup to align the data to the boundary (ignoring unaligned simd with intel). In short in assembly the easiest is to do a memcpy. While(addr&0x3) copy_byte(); while(addr&(16-1))copy_uint32(); then copy via simd
SIMD was first implemented in a computer in 1966. I wouldn't exactly call it "new". And even the modern variant with SSE was introduced in 1999. Basically all they are is that instead of only using two registers for e.g. an add instruction, you instead first load all the data you want into special arrays of registers which the operations then get applied on. After that you can move out your result
@@sinomit's crazy how some concept are typically seen as new, even though they're old as shit: SQL, SIMD, FP (Lisp is from the 50s), etc Edit: I have been guilty of this too, btw
Flexibility of unions in C++ however is severely reduced because of stricter safety checks so you have to usually work with some ugly reinterpret cast syntax to make it work
The first time I saw unions being used realistically was in some code for an LED light controller. The union had a struct with one byte per color, a 4-byte RGBI value, and a 4-byte array where each position was one part of RGBI.
Being an embedded developer, unions are a daily thing for me. However, there is one little thing I'd like to add - when you discuss the size of your json_t and say that the enum takes up only one byte... while that can be accurate, it isn't necessarily so. It depends on the compiler and its settings - I have worked with compilers where an enum always takes up 32 bits, as this is the native word size of the target architecture. In other cases, the minimum number of bytes needed to represent all values in the enum is used.
I've often used unions like in your last example. Usually I'd ensure correctness by using functions or macros to set both the discriminant and the value, just like your printJSON function ensures that you're correctly interpreting the data. I'd never seen unions used to provide multiple ways to read/write the same underlying data though, like with your IP address and hardware register examples, that's really neat! It makes unions an effective way to avoid the "primitive obsession" antipattern, beyond a simple "typedef int ipv4_addr".
Awesome vid! I remember when my prof was talking about unions I never really got it. I can’t believe it took me this long to actually learn it for real lol
Unions are awesome! Had a fun bug with it though, on the ARM you have to specify that it should pack variables, otherwise there are random zeros in the middle
I think the first time I discovered unions was when I was trying to send a floating point value over I2C. I spent way too long in the 'lets just convert it into a string and send those bytes, and then reinterpret that as a float on the host controller' Unions just let you solve that whole mess by only transmitting the 4 bytes or however many make up that datatype
It is interesting that c can already do so many thing just using struct, while union are pretty restricted to the use case that really make sense. So even if I knew it when I was newbie doing tutorial, I don't ready knew where and how to use it. It is when I once faced a problem that really need union to solve, I really know how to use it.
I usually use unions so I can access certain struct members as either their variable name or their array index: union { type name[3]; type name1,name2,name3;}; useful for things like gamepad buttons so I can loop over all buttons but still use their individual names (of course the same could probably be achieved with defines, or by using a pointer, but after the optimizer has done its thing who knows the packed order without messing with pragma pack) and I've used them in random number generators too to overlap bytes and create some wacky seeds for RNG
Other computing languages have used union-type structures and syntax for decades. For example, COBOL has the REDEFINES clause which does exactly the same thing. It is possible that this feature was added to C in part to enable a C program to interact with software and data from other computer languages. 30 years ago, I was involved in an EDI project (Electronic Data Interchange) involving the receipt of purchase orders and the sending of invoices via X.500. One computer was an IBM mainframe, the other was a Unix minicomputer. X.500 was an expensive protocol, so EDI was designed to send the required data in the smallest packets possible, and made heavy use of REDEFINES and unions to accomplish this.
I have absolutely heard of and implemented unions in C before, the fact that this video will not stop showing up on my youtube really annoys me. I came to comment for 2 reasons: 1. This is the only video that bothers me, I love your other content
Using a union to alias an anonymous bitfield struct with a byte array and a dword is a great trick to deal with peripheral registers instead of using bitmasks so long as you can guarantee that the struct will be packed properly.
I knew about unions but try to avoid them due to the fact that they require extra care in thinking how you are phisically organizing memory in your code (think of an array of unions, for example). In my opinion, another hidden gem in C is the "reserve" keyword. Thanks a lot for the video, high quality as usual !
I was doing this union/structure stuff 20 years ago....so powerful...especiall as an embedded C programmer....picking out specific bits of a byte...setting/ clearing etc.
I had come across unions during my first year engineering class, but there wasn't much focus on it, instead all attention was on structures (due to its use in Data Structures). I always wondered why do we need them, and even most websites' explainations of it being useful for type conversion felt silly as I can do it normally using format specifiers. But this is the first time I found the actual practical usage of them. Especially the polymorphism part, which can be used during some DS experiments involving conversion between postfix, prefix, infix operations.
unions are pretty dammed awesome, especially for doing what you did with the register or the ip address, you got a wild array of bytes and then use a union to say the structure of those bytes
I have used unions to serialize floating point number as their byte representation. Just write the float to the float part of the union and then read its byte array part when I need to serialize it. Just need to keep in mind the endianess (but that's mostly non-issue for me since the communicating system is always the same MCU)
Unions are super usefull for communications. You can save the CRC and Opcode and a union with the payload. Then depending on what the opcode is you can read the union in different ways.
You just proved that they are very useful functionality, not just some hidden peculiar oddity that exists solely because the programmer had spare time to fool around.
Something I love to do with unions is treating a multidimensional array as a single linear array for quick one off tasks that would only require a simple for loop instead of a set of nested for loops. Makes rereading my code easier and thinking a lot easier too
Depending on the use case, this might come with a massive performance hit since it bypasses some cpu and cache optimizations available for dealing with 2D data
@@Songfugel do you mean there are optimizations for, say, double[][], vs a double[] with the same number of elements? Can you give me any pointers on that?
@@user-sl6gn1ss8p Yes, you can search for cpu matrix (that is what 2d arrays can be) optimizations and also how keeping the for loops nested to limit the individual task of the work in the last loop to be in cache limits, the speed and cache optimization of the operations can be much better Nested for loops done correctly (you don't branch, and jump out whenever the job is done for that loop) are not bad is some context like 2D number crunching. There is an amazing video about it by DepthBuffer on YT called something like "nested loops can make your code faster" However, not to mislead, I have to point out that nested branches (like nested IF statements) can be very very bad
I consider union an early precursor to polymorphism with capability of adding simple RTTI concept by wrapping union and enum together into a struct, many people use unions this way. It's also the easiest and fastest way to understand polymorphism and it's benefits.
I use them to create a bunch of aliases for the members of vector and matrix types. union{struct{float x,y,z;}; float v[3];} Vector3; union{float m[9]; Vector3 row[3];} Matrix33; union{float m[12]; Vector3 row[4]; struct{Matrix33 mtx33; Vector3 translation;};} Matrix43; Very nice when you need to pass a portion of a composed struct as an argument to a function, or access elements with a loop iterator instead of by individual names. The only drawback is that it clutters the debug watch window.
A very good explaination of unions with nice examples. I do wish to point out that, unless you pack a structure, the size will surprise a few people. Structures, arrays, and types are usualy aligned to word (register size) boundries, which are implementation dependent. I once found a bug that wasn't, because the array, containing a string, was automatically aligned. The character array was declared to be 10 and contained 10 characters. The issues was that it reprented a string, which is suppose to be null terminated. The complier actually aligned the array to 12 bytes, not 10, so it worked because the last 2 bytes were 0. Modifying it to be UNICODE compatible made it blow up, because 10 UNICODE characters were aligned and there was no extra bytes to hide the mistake.
Huh.... food for thought. I've never used unions, but I can see how they can be extremely useful in some circumstances. Without unions I would approach those problems with bit-wise manipulations. By the way, excellent presentation.
Yeah you can use a union to do some pretty cool stuff. One is quickly read out the binary data of floats union u{ float f; unsigned int ui; }; union u val; val.f=1.0f; printf("%x",val.ui); output: 3f800000
One way I used unions a while ago was to interpret 32-bit color both as an uint8 array and 4 named 8-bit uints using: union RGBA32 { uint8 RGBA[4]; struct { uint8 R, G, B, A; }; };
i'm interested in Cyber Security. Know little of C and started to learn Assembly. I like your low level explanation of things. interested to learn more.. Sorry can't afford your course. But your effort deserve something
Ive known unions as features. But never how to use them. The one for embedding is simply genius and make everything easier. Also, for network packages/frames is amazing too
I used unions a lot in my previous job. The system's learning data would be flashed into ram all at once, and then we'd break it down into smaller and smaller structs depending on the "module" within the system. Because of legacy code, some of the structs could have slightly different type definitions across the code base. Using unions is a slightly more structured way of accessing that memory, as opposed to casting void pointers into the type that you're expecting.
Really well presented video. I really like unions, have used them in some shape or form for most of my embedded programming, although I have a lot of stuff that can’t use them as cleanly as I would like because I have to worry endianness, but that’s just how it goes
I didn't know that unions operated like this! I've only read about them since I was curious about C's 32 keywords and had never seen this one used. I thought it only contained one type at a time so it's useful to know how they actually work! Thanks for this!
Case 3 is called in sum type in ML language like SML or Caml or O'Caml. The possible values are the union of the values of the different member types but... no tag is automatically included in C/C++. In ML languages, the tagging is automatic ensuring safe execution.
I find unions very handy for all of the use cases you mentioned and more. As far as goto I generally abhor them, but in certain instances they are just the cockford Ollie and the code is just more logical than writing spaghetti to get around using one. At the end of the day your code should be as simple as possible and no simpler!
Dude, I'm taking a MicroP class right now and Unions are EVERYWHERE. Pretty much all configuration registers for devices are stored in some sort of union.
i use unions to break down types like floats and doubles for complex conversion, as well as dynamically access arrays and heaped memory. i think its great practice for any budding programmer to learn to use them, so its sad to hear theyre relatively obscure. theyre super useful even in C++ (even though much of their implementation is UB), much faster than a lot of other included functions and features.
The only time that I have used unions was to overlay registers. So, I could access EAX or AX depending on whether I needed the 16 bit or 32 bit register, but that was a long time ago and it was only for a hobby project.
For type punning be sure to use pragma push and pragma pack. Then also add a #if defined processor name to make sure that if the code is compiled for a different processor the endiness can be checked. For C++ the language labels it "undefined behavior", but most compilers have a defined way it is handled. C++ lacks reflection so serialization any other way takes a lot of code. With a untion only arrays and strings have to be handled separately. Also note that the structures in the Union can't be a class, but can be an aggregate.
I would have thought C uses them a lot more, just because Rust is so obsessed with enums (which is essentially a typesafe version of polymorphistic C unions, where the Compiler enforces you are not accessing the wrong variant, and the tag byte is not directly exposed).
Personally, I find unions most useful when dealing with CAN Network packets, since they can be so easily represented using unions (as they use stuffed bitfields a lot of the time)
Unions are used in quite a few places. One place it is very useful is in messaging. There is a common messaging API which is used by all cores and task. The messaging size is fixed and is a messaging payload. Each of the task will have a different representation of that data payload. So the overall structure/union is what is accepted by the messaging API, but up to the task what that data represents
I discovered union when I tried to name a function union
I discovered it from dwm
I think I learned about unions from some book when I decided to learn C.
Two keywords that are probably even more obscure are "register" and "volatile".
This is how I discovered the "register" type modifier
I discovered register keyword when i tried to name some sensor register as "register"
@HyperWin Just to add in for the conversation, that keyword tells the compiler not to change any of the code that is written, as that code may be used asynchronously, is changed by the hardware itself, or the software, such as a signal handler, changes the value itself. I used to use it for threads, because some of the code needed to work as intended and not moved or modified by the compiler for optimization purposes that will create undefined behavior.
unions are also very useful for dealing with communication packets, where you have a byte array that represents your entire packet, along with each item present within it. This way you can access the entire package (to send to other parts of your program), as well as each separate item
@@TheCarmacon I don't know if I understand your last point about consistency... consistency in communication can be achieved in other ways such as validating the package in N ways before reaching the union... Did you mean anything after that? Like any conversion problem? I would love to know more about it
I need an example of it. Can someone show me some demo code to understand how it works?
that's the only time I've used em
This is exactly what I've used unions for in communications between two embedded processors
@@0xDEAD_InsideLook at 3:57 in this video. In his code he's using the union to allow the hardware register to be accessed as different data types, but you use the same strategy when decoding communication protocols. You usually have an enum to represent the type of the payload, and you know which of the union'd structs to access based on the message type.
It's not a very good way to do this, though, because compiler and/or endianness differences between processors can lead to the same C code parsing the data differently.
It's convenient to access data this way, but you have to rely on compiler-specific features to make it reliable. And like with bitfields the compiler may be creating a lot of code behind the scenes, so you aren't really saving anything over just writing your own robust parsing code.
They say big corporations employ people to go into their repos and replace all C unions in commits with other corporate-approved solutions. They're known as "union busters".
Hah. If only union busting was as relatively benign as that... alas.
oh my
Take my like and go home.
The data conversion examples are cool, but it's important to note that they depend on the byte order ("endianness") and memory size of int, which can both change depending on what platform you're on.
If you end up depending on byte order, without using a memcpy family function, it probably means you have violated the type aliasing rules and compiler can miscompile your code.
The last thing is actually a very old concept called discriminated unions. Primarily used in functional languages, it was buried for years during OOP languages golden age, where it is effectively was substituted by inheritance.
More modern languages are bringing it back though, in light of recent tendencies where inheritance is considered a bad thing. Like Rust has native support for discriminated unions via its enums, TypeScript has them too etc.
Recent versions of Java support them as well, along with growing support for pattern matching.
The only issue with C unions is you don't get the same compile time checks, which are a big part of the safety/refactorability of algebraic data types
I think there is no connection to a bad OOP, obviously because even non-OOP languages mostly support basic features like dinamic dispatch through interfaces and type erasure or even allow you to make manual inheritance with pointers.
d. unions are just way better sometimes. in Rust ranges Option drops hasnext() function to get next element of a collection
Sum types like you describe allow polymorphism, just like inheritance and interfaces. The problem, or maybe benefit depending on the circumstance, is that it is closed-set polymorphism. All possible concrete types must be known and specified at compile-time. You can't simply create a new type and then conform to an interface - you must go back and modify the original sum type. This can be cumbersome. Consider Rust, modifying an enum ALSO requires modifying ALL match expressions associated with it. The code change is much larger, and you *might* have to modify stuff you don't want to or aren't allowed to.
pascal calls them variant records
I have been programming for over 20 years and yet while I knew about unions I never used them. Just goes to show that an old dog can learn new tricks. I love how you really explain the low level side of things. Keep up the great work!
Did you just never need one or did you just cast between different structs, instead?
@ThePC007 I never really understood them and equated them with a struct so just used structs or more commonly classes as most of my programming is in C++ (Qt/C++ to be exact)
When you are doing any kind of protocol, you basically want to use unions. You basially switch case on the opcode ie. first byte and then interpret the incoming frame accordingly. If you don't want to use unions for some reason, you can always do explicit type casting, which is not great.
@@VersDarkmoor In my experience we just use explicit write32_be/read32_le and friends to read out protocol fields. Casting unions/structs onto serialized data is a recipe for disaster once you build for a different microcontroller CPU.
@@VersDarkmoorthat sounds really dumb and unsafe
I used it while making my gameboy emulator, it helped a lot for mapping the CPU registers easily and in a lean way. The data inside some of those registers can be 8 bits (high or low part of a 16bit data, the A, B, C, D, E, H, L registers) or 16 bits (when combining two 8bit registers, the AF, BC, DE, HL), it's a neat feature.
But not portable due to endianness and alignment issues unfortunately.
@@Hauketal you probably can use some preprocessing to make it at least compilable for all endianes
@@giorgionegro5750 Oh, it is always compileable. But the result will be buggy, if you e.g. push BC onto the stack and then pop B and C separately. There is no standard header known to me to generate different code, except the one for the Linux kernel. But one can check at startup if the assumptions were correct. Like
union { long a; char b[4]} u;
u.a = 0x12345678;
switch (u.b[2]) {
case 0x34: // little endian;
break;
case 0x56: // big endian
break;
default: // weird, PDP-11 had 0x12 here
}
union reg_af{
unsigned char a;
unsigned char f;
unsigned short af;
}
Should be fine
Converting between types (such as the IP example) is called 'type punning'. Historically it has been poorly defined in the C standards, but does work in gcc C and C++ via a documented extension.
The draft C18 standard does clarify the situation, and explicitly allows it.
Note: You need to be careful with endianness - the uint32 representation of the IP address would be different based on endianness, so there are portability considerations.
It is worth noting that this is explicitly allowed only in C. It is an undefined behaviour in C++.
@@АнтонЕлькин-т8ъ Only technically. Its well-defined in all 3 major compilers via extensions.
Yeah there's too much undefinedness circling around unions. This video is fairly bad teachings.
I do wonder if it is allowed in this particular case, where strcpy() is used to write characters to some of the elements of Onion.str. If I understand right, C does not allow you to read a 4-byte union member after you write to a smaller union member such as a char, and this seems very similar to that situation.
@@EarlHutchingsonhe also passes a value of the wrong type as the first argument of strcpy().
I use unions in embedded programming as you described. An other trick for mapping a register to a variable is bit fields where you can make members of the union take a defined number of bits.
This - so useful!
Almost all embedded c developers know it :) ..
Could you provide an example use case for this please?
@EMLtheViewer, on a microprocessor, you'll often need to configure peripherals like a UART or timers. To do so you often need to set a number of bits in a register correctly, there are a number of ways of doing this, non of them are wrong but some are more difficult to read or maintain than others.
For example,
UARTConfig = 0x74;
Although this will work, anyone reading or maintaining this code would need to refer to the processor data sheet to work out what this does.
union
{
uint8_t Enable:1;
uint8_t Parity:2; // None, even, odd
uint8_t Prescale:5;
} UARTconfig;
UARTconfig.Enable = 1;
UARTconfig.Parity = Even; // assuming you've configured an enum or some defines to do this.
UARTconfig.Prescale = CalcUARTPrrscale(19200);
Then you can set the register to the value of this union. It is easier to read and maintain.
Bear in mind that it is up to the compiler author to decide what order the bit field is populated, therefore the code is not portable.
It's the first time I see "another" written like that, and it makes sense
I use unions all the time.
They are really useful.
I use it when writing a Lexer/Parser. (Though I use it other times too, it's just where I use it the most)
e.g.
typedef struct {
uint32_t line;
uint32_t col;
enum {IDENT, NUMBER, STRING} kind;
union {
struct {
char* data;
uint32_t len;
} string;
uint64_t number;
}
} Token;
I use unions to pack 8 byte CAN messages all the time. Its also very useful when using bitfields as well, instead of checking eight or sixteen individual bit flags, you just check to see if the entire thing is or is not zero.
i hear this all the time, but I don't understand because I thought a lot of this is UB or something like that
@@xhivo97 its probably technically undefined by some strange wording in the standard, but in practice it works fine everywhere.
@@xhivo97 yeah your code might not be portable, but in all my applications so far it didn't need to be. It could break if one system had a different endianess.
Exact same use case for me, very useful to separate controller ID from data and checskum and so on
@@xhivo97 The bit-ordering is what's undefined here. AFAIK you can't even rely on tests for endiannes, so you need to test bitfields on a specific compiler/processor if you want to support it. As @feeditehh said in practice it works fine most everywhere, but that next microprocessor that comes out might just find it more efficient to do things in the opposite order.
Unions are great! It's extremely useful to be able to interpret a single piece of data as different types, structures or even arrays. Say you have a 32 bit pixel representing RGBA channels, sometimes you may want to access individual channels with their own unique names as you would in a struct, sometimes as raw byte arrays and maybe sometimes you want to simply assign a 32 bit value to the whole pixel.
Unions are a nice thing to have in a barebones language like C, where you are allowed to reassign the type of a variable by raw pointing, a union is just a fast and concise way to do basically the same thing.
I do embedded programming and use unions quite a bit. I think they are very useful, but they have their problems. For instance, if your code is targeting two different platforms with different endianness, then multi-byte unions will give you a very bad day!
This can actually be turned into an advantage, because you can essentially detect endianess this way. **wink wink**
Endianness is not a union specific issue. It affects all data structures whenever multi-byte types are involved
I've used unions for some embedded stuff but most of the time I just end up using a bunch of bitwise operations instead. Never thought about using them for polymorphism though, that is actually pretty cool!
Cool - and dangerous. The compiler doesn't give any guard rails, wouldn't recommend trying at home
@@CamaradaArdi Got it, I'll try on the prod environment instead 🗿
I discovered unions through cppreference
And when I was writing my own json parser as an exercise, unions were the core of library's design
They are not used really often, but sometimes they are irreplaceable :)
Edit: and they are what makes C more functional of a language than python and many other popular ones. Because unions and structs are basically algebraic data types
Learned about unions when I was teaching myself C back in the early 80s... have used them a lot over the years... very common in systems and embedded programming, data conversion, and the like.
OK THAT COUPLING WITH A TYPE INDICATOR IN A STRUCUTRE THING WAS SICK AND IS EXACTLY WHAT I NEEDED IN MY PROJECT THANK YOU
I learned unions quite early in my C language learning process, I find them extremely useful for all the cases you stated here and more. For making VM's for in development hardware and domain specific languages they are a godsend, also comes in handy in game engine programming, hell my yet to be uploaded codebase for a dead simple and easy to use and maintain forth with readable code and a focus on being embeddable in applications as a lua replacement uses that. Speaking of which, I got a video suggestion: Forth! It has seen its fair share of use in embedded systems.
Prior to database management systems, data was written in records composed of fixed-length fields. Unions were used to reinterpret the layout of those records, where the type was indicated at the start of the record to simulate a tagged union similar to your last example, and to provide ways to access elements within a field, e.g. 8 chars for the whole date unioned with 2/2/4 chars for month/day/year. Storing data like this is why the strn* functions exist in the standard library; they weren't intended to be "safe" versions of the non 'n' variants as people started suggesting in the 90s. I don't know if it is useful to learn how computing was done prior to the 80s as a lot of it isn't relevant today unless you find yourself interacting with COBOL but it is helpful if you want to know where some of this stuff comes from.
That’s very handy to know. Thank you.
@ 2:00 - this isn't valid in C++, and is in fact undefined behavior due to the object lifetime rules: Only one member of a union may be active at any time. What you're demonstrating here is type punning, and is considered a 'wrong" use of unions. Instead, use memcpy() (which the compiler will helpfully reduce to an implicit union).
or simply use unions like that because it's funny
Short and sweet, really captured the essence of unions, I have never understood it better until now. Thanks!
I love it because it can make complicated things more easier. I used it as a messaging protocol, where the shared object is a header, with a type, then the buffer after that depends on the header along with crc
I've used unions to make my C++ code more readable. I had a Vector3 class that I would use to represent rgb color values, and xyz coordinates. Instead of having two Vector3 types, or storing values twice, I used unions for each of the floats.
This right here. Great in graphics/game programming when trying to cannibalize memory (especially to have structs/data fit on a single cache line (64-bytes, generally)).
Do you happen to have a git repo I could check out?
Please make one about type punning and Undefined behavior (in C and C++) :))
Every book on C programming I've ever used had had a section teaching about "structures and unions."
Now, I've rarely ever used unions, but I appreciate how it can give your program an alternate view of your data.
I can think of several interesting ways to use this feature, but very few practical ones.
One of the better RUclips-teachers. In this case it is quite simple but explaining it at a fast speed without neglecting details is an art.
Hey LLL, can you do a video on how SIMD works under the hood? Because it's relatively new, not many assembly textbooks cover it and how to write programs to take advantage of it.
See anger fog's blog for performance programming.
Under the hood, a CPU has more than just 1 'adder', it has several ALUs. SIMD is aligning those ALUs to do the same thing (add) at once (or double pump). The cost is generally power, and some developer setup to align the data to the boundary (ignoring unaligned simd with intel).
In short in assembly the easiest is to do a memcpy. While(addr&0x3) copy_byte(); while(addr&(16-1))copy_uint32(); then copy via simd
SIMD was first implemented in a computer in 1966. I wouldn't exactly call it "new". And even the modern variant with SSE was introduced in 1999.
Basically all they are is that instead of only using two registers for e.g. an add instruction, you instead first load all the data you want into special arrays of registers which the operations then get applied on. After that you can move out your result
Yes please; also, how to do it on Apple silicon. I tried some SIMD code on ARM 64 and couldn't get it to work.
@@sinomit's crazy how some concept are typically seen as new, even though they're old as shit: SQL, SIMD, FP (Lisp is from the 50s), etc
Edit: I have been guilty of this too, btw
Flexibility of unions in C++ however is severely reduced because of stricter safety checks so you have to usually work with some ugly reinterpret cast syntax to make it work
Interesting, thank you for your introductions to Unions. I will experiment with it and understand it 100%
The first time I saw unions being used realistically was in some code for an LED light controller. The union had a struct with one byte per color, a 4-byte RGBI value, and a 4-byte array where each position was one part of RGBI.
Being an embedded developer, unions are a daily thing for me.
However, there is one little thing I'd like to add - when you discuss the size of your json_t and say that the enum takes up only one byte... while that can be accurate, it isn't necessarily so. It depends on the compiler and its settings - I have worked with compilers where an enum always takes up 32 bits, as this is the native word size of the target architecture. In other cases, the minimum number of bytes needed to represent all values in the enum is used.
I've often used unions like in your last example. Usually I'd ensure correctness by using functions or macros to set both the discriminant and the value, just like your printJSON function ensures that you're correctly interpreting the data.
I'd never seen unions used to provide multiple ways to read/write the same underlying data though, like with your IP address and hardware register examples, that's really neat! It makes unions an effective way to avoid the "primitive obsession" antipattern, beyond a simple "typedef int ipv4_addr".
Awesome vid! I remember when my prof was talking about unions I never really got it. I can’t believe it took me this long to actually learn it for real lol
Man, Your work is nice.
It should enter into Library of Congress.
Unions are awesome! Had a fun bug with it though, on the ARM you have to specify that it should pack variables, otherwise there are random zeros in the middle
I didn't know C had unions! That is so cool!
You should do a video on bitfield structs, with variable width fields. Section 6.9, page 149 in K&R.
Thanks for your insightful information about C.
Because of my bad hearing I struggle, because of the background music...
In embedded, unions are super useful for setting individual bits and bitfields within a word.
I think the first time I discovered unions was when I was trying to send a floating point value over I2C. I spent way too long in the 'lets just convert it into a string and send those bytes, and then reinterpret that as a float on the host controller'
Unions just let you solve that whole mess by only transmitting the 4 bytes or however many make up that datatype
I worked on a machine code excutor for a school project.
Union saved my butt for "converting" an unsigned short into a array of unisgned char
It is interesting that c can already do so many thing just using struct, while union are pretty restricted to the use case that really make sense. So even if I knew it when I was newbie doing tutorial, I don't ready knew where and how to use it. It is when I once faced a problem that really need union to solve, I really know how to use it.
I usually use unions so I can access certain struct members as either their variable name or their array index:
union { type name[3]; type name1,name2,name3;};
useful for things like gamepad buttons so I can loop over all buttons but still use their individual names (of course the same could probably be achieved with defines, or by using a pointer, but after the optimizer has done its thing who knows the packed order without messing with pragma pack) and I've used them in random number generators too to overlap bytes and create some wacky seeds for RNG
I always thought who would use a union, but this video opened my eyes wide open with their power.
Other computing languages have used union-type structures and syntax for decades. For example, COBOL has the REDEFINES clause which does exactly the same thing. It is possible that this feature was added to C in part to enable a C program to interact with software and data from other computer languages. 30 years ago, I was involved in an EDI project (Electronic Data Interchange) involving the receipt of purchase orders and the sending of invoices via X.500. One computer was an IBM mainframe, the other was a Unix minicomputer. X.500 was an expensive protocol, so EDI was designed to send the required data in the smallest packets possible, and made heavy use of REDEFINES and unions to accomplish this.
I have absolutely heard of and implemented unions in C before, the fact that this video will not stop showing up on my youtube really annoys me. I came to comment for 2 reasons:
1. This is the only video that bothers me, I love your other content
Found out about unions when I was making my final project for CS50x a few days ago. Used one in one of my structs for my fighting sim program.
I'm always using them programming for MCUs. Sometimes I need some tricky data rearranges and unions helps me.
Using a union to alias an anonymous bitfield struct with a byte array and a dword is a great trick to deal with peripheral registers instead of using bitmasks so long as you can guarantee that the struct will be packed properly.
thanks for the tips !
I also use struct of function to make some kind near C++ with C
I knew about unions but try to avoid them due to the fact that they require extra care in thinking how you are phisically organizing memory in your code (think of an array of unions, for example). In my opinion, another hidden gem in C is the "reserve" keyword. Thanks a lot for the video, high quality as usual !
reserve isn't a keyword in c. did you mean register?
@@rz2374 The description of a "hidden gem" made me think of restrict.
@@rz2374Hopefully not because register is mostly deprecated and the compiler will usually ignore it.
@@rz2374 he probably meant restrict
I think you mean "restrict", not "reserve"
The X intrinsics toolkit (the one Motif is built on) used this trick but with the type identifier being the first member of each struct in the union.
I was doing this union/structure stuff 20 years ago....so powerful...especiall as an embedded C programmer....picking out specific bits of a byte...setting/ clearing etc.
I had come across unions during my first year engineering class, but there wasn't much focus on it, instead all attention was on structures (due to its use in Data Structures). I always wondered why do we need them, and even most websites' explainations of it being useful for type conversion felt silly as I can do it normally using format specifiers. But this is the first time I found the actual practical usage of them. Especially the polymorphism part, which can be used during some DS experiments involving conversion between postfix, prefix, infix operations.
unions are pretty dammed awesome, especially for doing what you did with the register or the ip address, you got a wild array of bytes and then use a union to say the structure of those bytes
Always wondered what these were...definitely see how these could save some instructions here and there. Thank you.
I have used unions to serialize floating point number as their byte representation. Just write the float to the float part of the union and then read its byte array part when I need to serialize it. Just need to keep in mind the endianess (but that's mostly non-issue for me since the communicating system is always the same MCU)
Unions are super usefull for communications.
You can save the CRC and Opcode and a union with the payload. Then depending on what the opcode is you can read the union in different ways.
I've used unions for creating generic types in my program and serialization.
You just proved that they are very useful functionality, not just some hidden peculiar oddity that exists solely because the programmer had spare time to fool around.
Something I love to do with unions is treating a multidimensional array as a single linear array for quick one off tasks that would only require a simple for loop instead of a set of nested for loops. Makes rereading my code easier and thinking a lot easier too
Depending on the use case, this might come with a massive performance hit since it bypasses some cpu and cache optimizations available for dealing with 2D data
@@Songfugel do you mean there are optimizations for, say, double[][], vs a double[] with the same number of elements? Can you give me any pointers on that?
@@user-sl6gn1ss8p Yes, you can search for cpu matrix (that is what 2d arrays can be) optimizations and also how keeping the for loops nested to limit the individual task of the work in the last loop to be in cache limits, the speed and cache optimization of the operations can be much better
Nested for loops done correctly (you don't branch, and jump out whenever the job is done for that loop) are not bad is some context like 2D number crunching.
There is an amazing video about it by DepthBuffer on YT called something like "nested loops can make your code faster"
However, not to mislead, I have to point out that nested branches (like nested IF statements) can be very very bad
I consider union an early precursor to polymorphism with capability of adding simple RTTI concept by wrapping union and enum together into a struct, many people use unions this way. It's also the easiest and fastest way to understand polymorphism and it's benefits.
They're incredibly useful in game programming since they're the basis of variant data types. Love me a delicious union
I use them to create a bunch of aliases for the members of vector and matrix types.
union{struct{float x,y,z;}; float v[3];} Vector3;
union{float m[9]; Vector3 row[3];} Matrix33;
union{float m[12]; Vector3 row[4]; struct{Matrix33 mtx33; Vector3 translation;};} Matrix43;
Very nice when you need to pass a portion of a composed struct as an argument to a function, or access elements with a loop iterator instead of by individual names. The only drawback is that it clutters the debug watch window.
I remember using this in Arduino. I wanted to give individual names to my digital output pins, but also iterate through all the pins in a single loop.
This was great timing after just learning about Unions in Zig while doing Ziglings last night
They're basically a must for embedded, iot and industrial micro controller programming
Randomly learned about unions through Unreal Engine years back. Vectors are defined structs with a union over the member types.
I'll watch this to hear you out, but I learned about unions the same day I learned about structures, in 1996.
A very good explaination of unions with nice examples.
I do wish to point out that, unless you pack a structure, the size will surprise a few people. Structures, arrays, and types are usualy aligned to word (register size) boundries, which are implementation dependent.
I once found a bug that wasn't, because the array, containing a string, was automatically aligned. The character array was declared to be 10 and contained 10 characters. The issues was that it reprented a string, which is suppose to be null terminated. The complier actually aligned the array to 12 bytes, not 10, so it worked because the last 2 bytes were 0. Modifying it to be UNICODE compatible made it blow up, because 10 UNICODE characters were aligned and there was no extra bytes to hide the mistake.
Huh.... food for thought. I've never used unions, but I can see how they can be extremely useful in some circumstances. Without unions I would approach those problems with bit-wise manipulations.
By the way, excellent presentation.
Yeah you can use a union to do some pretty cool stuff. One is quickly read out the binary data of floats
union u{
float f;
unsigned int ui;
};
union u val;
val.f=1.0f;
printf("%x",val.ui);
output: 3f800000
One way I used unions a while ago was to interpret 32-bit color both as an uint8 array and 4 named 8-bit uints using:
union RGBA32 {
uint8 RGBA[4];
struct {
uint8 R, G, B, A;
};
};
they're unionizing
i'm interested in Cyber Security. Know little of C and started to learn Assembly. I like your low level explanation of things. interested to learn more.. Sorry can't afford your course. But your effort deserve something
This kind of reminds me of that Quake 3 algorithm. Instead of all that crazy casting he does, he could have used a union. Neat!
Ive known unions as features. But never how to use them. The one for embedding is simply genius and make everything easier. Also, for network packages/frames is amazing too
just noticed that 42 is * in ASCII. 42 truly never stops to amaze.
This reminds me of TypedArray in javascript. You are essentially interpreting a buffer in different ways.
I can see myself using this, I forgot that the union exists, then again I don't use C daily. Great video
I used unions a lot in my previous job.
The system's learning data would be flashed into ram all at once, and then we'd break it down into smaller and smaller structs depending on the "module" within the system.
Because of legacy code, some of the structs could have slightly different type definitions across the code base. Using unions is a slightly more structured way of accessing that memory, as opposed to casting void pointers into the type that you're expecting.
Really well presented video. I really like unions, have used them in some shape or form for most of my embedded programming, although I have a lot of stuff that can’t use them as cleanly as I would like because I have to worry endianness, but that’s just how it goes
I didn't know that unions operated like this! I've only read about them since I was curious about C's 32 keywords and had never seen this one used. I thought it only contained one type at a time so it's useful to know how they actually work! Thanks for this!
Case 3 is called in sum type in ML language like SML or Caml or O'Caml.
The possible values are the union of the values of the different member types but... no tag is automatically included in C/C++. In ML languages, the tagging is automatic ensuring safe execution.
Unions are basically a syntactic sugar to prevent excessive amounts of explicit casting.
I'll be sure to check out the course on the website.
I find unions very handy for all of the use cases you mentioned and more. As far as goto I generally abhor them, but in certain instances they are just the cockford Ollie and the code is just more logical than writing spaghetti to get around using one. At the end of the day your code should be as simple as possible and no simpler!
Dude, I'm taking a MicroP class right now and Unions are EVERYWHERE. Pretty much all configuration registers for devices are stored in some sort of union.
one of my biggest worrys about learning c was that there would be no polymorphism, this is very relieving
i use unions to break down types like floats and doubles for complex conversion, as well as dynamically access arrays and heaped memory. i think its great practice for any budding programmer to learn to use them, so its sad to hear theyre relatively obscure.
theyre super useful even in C++ (even though much of their implementation is UB), much faster than a lot of other included functions and features.
The only time that I have used unions was to overlay registers. So, I could access EAX or AX depending on whether I needed the 16 bit or 32 bit register, but that was a long time ago and it was only for a hobby project.
For type punning be sure to use pragma push and pragma pack. Then also add a #if defined processor name to make sure that if the code is compiled for a different processor the endiness can be checked. For C++ the language labels it "undefined behavior", but most compilers have a defined way it is handled. C++ lacks reflection so serialization any other way takes a lot of code. With a untion only arrays and strings have to be handled separately. Also note that the structures in the Union can't be a class, but can be an aggregate.
I would have thought C uses them a lot more, just because Rust is so obsessed with enums (which is essentially a typesafe version of polymorphistic C unions, where the Compiler enforces you are not accessing the wrong variant, and the tag byte is not directly exposed).
0:41, should be putting floating point numbers 1st in unions, had a compiler complain at me before when I didn't
I'm confused. Thought unions were well known. :) Now back to watching the video
Personally, I find unions most useful when dealing with CAN Network packets, since they can be so easily represented using unions (as they use stuffed bitfields a lot of the time)
I have used unions in the past for MIDI data to be able to efficiently break apart each byte in the message
I've used something similar to the polymorphism example when sending messages between two processes while working with QNX for an RTOS course
Unions are used in quite a few places. One place it is very useful is in messaging. There is a common messaging API which is used by all cores and task. The messaging size is fixed and is a messaging payload. Each of the task will have a different representation of that data payload. So the overall structure/union is what is accepted by the messaging API, but up to the task what that data represents