Unions are a case where I believe the verbosity of typing "union foo" (vs hiding behind a typedef) is a feature, not a bug. When you use a union, you are deliberately making the decision to refer to one memory location by multiple names. That's unusual enough that I want the "union" keyword front and center to draw a reader's attention.
I'm making a C api for a popular service and there are a lot of potential responses that all need a separate struct to retain the data. There are also many different enums, and removing all the typedefs for structs and enums significantly improved the readability of everything. I know it's a bit verbose, but for larger projects, I definitely recommend not typedeffing structs and unions
I used unions in a Sega Master System emulator I'd started working on: the Zilog Z80 microprocessor has 8-bit registers that can pair up and act as though they were 16-bit ones. With the help of unions and structs, I could manipulate such registers both ways without the need for bitwise operations and further logic. Quite useful and clever 😁😁
@@JacobSorber hello sir your way of explanation is better than paid classes and I am really sorry to compare with them .. I had a small doubt in union topic when I execute the below code #include #include #pragma pack(1) typedef union { uint32_t* p; uint8_t* q; } somepointer; int main(int argc, char** argv) { uint32_t r; uint8_t s; somepointer p; r = 10; s = 15; p.p = &r; p.q = &s; printf("%d %d ", *p.p, *(p.q)); } I am getting answer 2575 15 but i expected both 15 as output what is difference when we use pointer member in union how the memory is handled .
union members p and q only store the memory location of a pointer that is just 4 bytes. In your main program you set the p field of the somepointer and then you set the q field of the somepointer that overrides the memory location in p field since p and q use the same memory location. That's why you got different result from p location. @@ramakrishna4092
@@ramakrishna4092 this is because now you have the pointer to s into the union somepointer, and s points to a single byte. When you dereference p.p, you are dereferencing a pointer to a FOUR byte type, so the cpu fetches not only the single byte that the pointer in somepointer is pointing to, but also the next three bytes after wherever s is in the stack, thus returning a non-sensical value.
Unions are awesome when you're making a programming language with dynamically typed variables. It's nice to have a struct with the value type information and an union storing the actual value.
I'm working on a university project to make a python assembler in C. The teachers invented a .pys filetype that works as a middle step between .py and .pyc and we are programming the assembler that converts .pys in .pyc in C. We use unions exactly for what you've described.
a neat thing about unions is you can use them to serialize a float to send it down some serial communication like uart , a float element and an uin32_t element and thats it. then send the uint32_t byte by byte and assemble on the other end back to a float, via pointer or another of the same union. serializaton like this is also useful with structs to send them along their serial way, you don't need a union to do that but it's a much cleaner way than with pointers
iirc there's some dumb obscurities in the language that says type-punning with unions is implementation-defined behavior or something like that. The most correct way is to use memcpy to bit-cast (the compiler is smart enough to optimize it) or, assuming your compiler supports it, use __builtin_bit_cast / std::bit_cast.
@@Minty_Meeomemcpy on a tiny microcontroller? Implementation is known as you know what cpu you are running on. His method is definitely useful and used.
I have been using Linux for over 15 years and struggling to learn C on my own by reading manpages and trying to decipher source code. I have learned more in a few hours on your channel than all those years combined. Thank you for what you do.
Thanks. Explained it well enough for me. Going over structs in a course right now, and the instructor just flat-out says "don't use unions. They are dangerous." What you gave is a perfect use case of when I should consider them. As with all things low-level programming, they are only dangerous in the hands of a rookie, but powerful in the hands of someone who knows what they are doing. Much appreciated.
Bold of you to discuss unions without mentioning hardware peripherals where the same configuration space will have different register meanings depending on the version or mode of the peripheral being configured. That's like 95% of the usage for unions.
Very useful with matrices and vectors. Have a union with an anonymous struct inside holding the individual elements, and an array with the size of the number of elements in the top level union. Then you can change/get the values of the elements by their names (like x, y, z etc) or iterate over them using the array.
Unions seem like one of those things that aren't generally useful, and very error prone, but would have some very good niche cases. Could be useful for something like fast inverse square root, or some case where type punning is useful. Coming from C++, polymorphism would be a lot less error prone I would think for the use case shown here though.
I've been programming C++ for like 3 years now and this is the clearest way someone has explained this. Most people just handwave the question and just say it's "some old thing from C, don't worry about it."
Union is very useful for networking, because sometimes we expect structured data (struct) but has to be bytes before being sent. Thus, it would be struct in an union.
Very informative and excellent example. I really got a more profound understanding that some tools aren't supposed to work very efficiently out of the box but rather it give higher advantages when used in combination with other stuff; like, structs in this example.
have union and enum in a struct. use enum to indicate which variable of union is valid. use switch to do operations on a struct depending on what you need. Simpler than OO, right?
You explained a lot about the advantage of saving memory, but glossed over the fact that you can stomp on values already saved. Would like to know more about the cons/pitfalls when using these.
I've used unions a lot. Any time you have a record format with a variable area, you can union together several structs to define the area. You can also union a struct with a buffer. Read the record into the buffer name, and refer to all the fields by the struct names.
I use unions to get at byte values for floating point numbers for rounding, un-signing, etc. Also the different values in a union can themselves be structs which makes for some pretty powerful either/or representations.
I've not seen an unnamed union in a struct before. Nice trick. Also unions can be used to serialized a struct, which is handy on a micro pricespy when sending data over a serial link.
I use something like this for handling the registers of an embedded device: struct { // stuff... union { struct { uint16_t foo; uint16_t bar; uint16_t baz; // etc... }; uint16_t regs[REGISTER_COUNT]; }; } the_device; The communication protocol requires the registers to be identified by numbers (hence the array), but the source code is more readable if I can write the_device.foo rather than the_device.regs[INDEX_OF_FOO]
I recently wrote a (pretty bad :D) JSON parser as a training exercise and was wondering how to best implement the dynamic typing... Ended up just using void pointers and lots of casting... Looks like unions would've been the proper way to do it. Good to know!
Hi, thank you very much for your videos. All the videos I've seen from your channel are pure gold. Thank you for so much, sorry for so little. Greetings.
I only learned recently the hard way and very quickly that using a union is the equivalent of typecasting the variable, not the literal being assigned to it, i.e., ((type) variable) = value, not variable = (type) value. The reason I found out was because I was writing as uint32_t and then reading as uint64_t. In debug, I noticed the low 32 bits had the correct value, but not the high 32 bits. Sure enough, it turns out I had mistakenly believed when I first was introduced to C that it had the same behavior as casting the assignment value. This differs in that, at least for x86, casting a uin32_t to uint64_t would generate a movzx (move with zero extend) instruction, whereas with a union instead, we just get a movd instruction (move dword). It was for creating a simple output parameter that gives two status codes: general_cause, and specific_cause. This would permit the user of the procedure to determine what should be done with the output based on the status codes given, in this case, for an HTTP API that I implemented using WinHTTP. The solution was to set the values to zero at the start of each procedure that provides these output parameters. Certainly an improvement over exceptions to use this simple method, but unfortunately, C doesn't have a nice way of clearly separating input parameters from output parameters, and polluting the return values with interprocedure state metadata is absolutely unacceptable as the return value should return the result of its execution--things that are inherently void type shouldn't return a bool or otherwise for information about operation success or failure because that is accidental to the procedure and makes the procedure interface ambiguous in the best case, and misleading in the worst case. It's incredibly cringe to see if (!WinHttpProcedureThatReturnsNothing(...)) { fatalf(...); } as opposed to WinHttpProcedure(..., &status); if (status.general_cause == HTTP_FAILURE) { fatalf(...); }.
I'm using those unions when I'm working in embedded systems and, for example, I need to transmit an uint64_t variable via infrared LED to an A/C. But this uint64_t contains multiple commands and parameters inside. With union you can use only one uint64_t space memory and divide it in multiple uint8_t, uint16_t, uint32_t variables or even define number of bits for each member inside of this variable. Each member it is a command or a parameter for A/C. Actually you are sending a set of: mode, temperature, fan speed, etc in one uint64_t variable.
I have used unions like for this in the past. Infact major vendors like Texas instruments use this technique in the hardware abstraction library. But using this in the networking stack is a bad idea. This technique relies on implementation defined behavior which makes your code less portable. Depending on your business priorities, it might be ok. But in the long run, it causes huge issues. On of my previous codebases made heavy use of this technique. Due to cost and availability issues, we had to change our microcontroller and we ended up having to rewrite most of our firmware.
Total newb, but I know what unions are for and it honestly blows my mind how clever it is. The common understanding I have seen in videos is that saves memory, but you can also intentionally do variable shadowing or have a variable "interpreted as" another type.
Writing my own Base64 conversion function was a great place to trot out a union. The 24 bits of binary expand into 32 bits of printable ASCII, and vice versa... You gotta be a bit careful about 'endian', but it's a clean solution lacking the usual plethora of 'temp' variables. and oddball loops... And, on my 32-bit system (for my own use), I sometimes make a union of a (native) double with 2 ints or a pointer and an int, or whatever, allowing functions to return multiple values melded into a single package. Yes, it's a non-portable hack, and one needs to be careful, but it partially overcomes C's aversion to passing structs...
I don't understand why you wrote "typedef" for the struct. Whenever I address a struct I instantiate using its name. Example: struct foo { int x; }; I would just use: foo myFoo; And go on with my day and it works fine. I never had to write "struct foo myFoo;". Care to explain?
The required C MISRA rule 18.4 from 2004 says that unions shall not be used. MISRA C++ 2008 says the same in rule 9-5-1. In the edition MISRA C edition from 2012 this rule is lowered from "required" to "advisory" (19.2). Personally, I also try to avoid unions. But there are some situations where a union is the only or the best way. Some examples are shown in the comments. For your example, Jacob, a C++ programmer would advice to use polymorphic classes. In C, I could imagine to use a struct with pointers to the specific data (personality, firmware). Only one of these pointers should be unequal to NULL. With this, you can also save the boolean flag which distinguishes between robot and human being. Nevertheless: This is again a great eposiode which helps me to think about data concepts. Thank you!
Hi Jacob, great video. If you are looking for video suggestions, one I'd be interested in is why there seems to be such a holy war between using typedef and not using it. It seems like half the people I ask are vehemently against them, and the other half use them all the time. I'm not totally sure I understand the reason against. Thanks
Thanks for the video! Like some people mentioned in the comments, another use case that arose from this shared memory feature is data packing/unpacking in serial communication. I found a technical article that summarizes it nicely: "Union in C Language for Packing and Unpacking Data" - by Dr. Steve Arar
Just realized I could use this in project. I had a Header struct and two different node structs that had the header as the first member plus different data arrays that occupy the same space. Now I unioned the data arrays and no longer have to access the header members through object->header.member. One little issue is that I need to know the size of the header to determine how big the data arrays can be and I'm not sure of the best way to do that... I want the struct to always be one memory page in size.
I just wrote a program in notepad and have not tested it but you will understand the basic of what I am trying to do. enum TYPE { RECT, CIR }; // Child Struture - 1 struct Rectangle { // Width & Depth float b,d; }; // Child Structure - 2 struct Circle { // Radius float r; } // Parent Structure struct Shape { union { struct Rectangle R; struct Circle C; } enum TYPE type; }; // Function to calculate Area float calculate_area(Struct Shape s) { if (type == RECT) return s.R.b * s.R.d; else return 3.14 * s.R.r * s.R.r; } int main() { struct Shape shp; shp.type = RECT; shp.R.b = 10; shp.R.d = 20; printf("Area = %f ", calculate_area(shp)); }
IMO the unions rule when you handle some control protocol. You have two dozen different commands each with different parameters, so you make a struct command, union parameters, to pass the command around.
I use union in 16-bit mcu when I need to read the MSB or the LSB separately. I just think it is more readable than doing pointer arithmetic: typedef union { int word; char byte[2]; } data16_t;
Thanks for the video. Throughout my CS degree, unions were kinda skipped over lol, so I had no idea the difference. Only question I have is how it determines which is bigger when given pointers? Does it assume 4 bytes for the pointer or follow the address and use sizeof() internally?
Pointer size varies by machine. Some microcontrollers have 2-byte or 4-byte pointers. Your typical 64-bit machines (my laptop and probably yours) have 8-byte (64-bit) pointers. And, yes, on a particular machine, the size of a pointer is the same no matter what it points to.
It feels like conceptually we could replace the idea of a struct with an interface like in Java or a protocol in Swift, or in some other way via subclasses. Obviously the point of C is to not have classes, but still, just a thought.
Ok so unions are basically if there is a time where only one variable is true out of many then unions are pretty good to use as optimize the code more ^^
One thing I am confused by is what is the difference between accessing a struct property via -> vs dot notation? I've Googled a bit on this and it seems like they are not exactly interchangeable depending on whether or not the property is a true pointer.
You use “->” when you have a pointer: it's either a_struct.a_member or a_pointer_to_struct->a_member The later is just syntactic sugar for (*a_pointer_to_struct).a_member
A small note on how the endianness of CPU affects the alignment of elements in the union would be helpful. I believe you might have already covered it in one of the videos, but would be useful here for someone new, since we are already in this context. FYI, my previous organization's 20+ year old firmware (still evolving) has this common datatype which consists of a structure of union of all datatypes like int, bool, float, string, etc in it. The variable of this complex datatype is thrown around everywhere in a highly complex-interdependent environment, over network and across various platforms, which seems very complex at first but is a lifesaver once you understand it. The actual datatype of this variable, and the value it holds is evaluated only at the source and the intended destination. I think it's like a simple encapsulation.
I wanted to say the same. You can also mix bitfields in with it too (perfect for HW registers), so you have nice setter/getters for the fields while also able to read/write the whole 16/8bit value at once. Makes code a lot easier to read too.
Maybe, maybe not. It depends on what you're doing with those pointers, but the pointers will share the same location, so writing to one and then the second write will overwrite the second. So, if you access the first, you'll get the value you stored for the second. If that first pointer was important, you might have lost important data (and possible memory leak?). But, whether or not it seg faults, depends on what address you stored for the second one, and what you try to do with it.
@@JacobSorber I haven't actually programmed for 20 years or so, I used unions for the flexibility they give you. I was under the impression that although the pointer is concerned, it is the same, the compiler interpretation of it varies in the manner (variable) used in the source code. Is this behavior still correct?
Anyone could tell me why the size of the struct is 12 bytes, when: int = 4 bytes, float = 4 bytes and char = 1 byte? The other 3 bytes are used for what?
I personally against union, they're fun and powerful but their downside is that the system endianness also matters, meaning that little and big endian would produce different results. If possible, I'd always prefer bitwise operations over union, and in C++ std::variant also may come handy rather than using union and a type.
Structs vs C++ tuples? My preference is structs/classes (if necessary) by default as tuples make for a weird mess if not used properly in addition to a bunch of templating madness.
I can see endianness only matters when using a union for things like joining smaller registers into a larger register, eg union reg_pair { struct { int_8 b,c; } int_16 bc }; } mypair; on a little endian system this would swap the two 8 bit registers being used together, but not on a big endian system. If you're using the union to merge different data together then endianness isn't going to matter. I used a union to merge the different data-section types in a file: each record was the same length but depending upon a record header the actual data stored differed: read record, check header of record and access the data as appropriately.
I use a union, when my program has to convert e.g. 4 char into an int32_t (maybe an ipv4 address), which make it very easy without a bunch of bit fiddling. ;o) Another good reason, if you need a kind of variant variable type.
@@267praveen This is a generic 4 bytes to int converter #include // To get __LITTLE_ENDIAN #include // For uint8_t, etc. typedefs // toInt() bytes to int converter. typedef union u_char2Int{ char ac4Bytes[4]; uint32_t uint32; } t_char2Int; /******************************************************************************* * Name: toInt * Purpose: Converts up to 4 bytes to integer. *******************************************************************************/ int toInt(char* pc4Bytes, int iCount) { t_char2Int tInt = {0}; for (int i = 0; i < iCount; ++i) # if __BYTE_ORDER == __LITTLE_ENDIAN tInt.ac4Bytes[i] = pc4Bytes[i]; # else tInt.ac4Bytes[i] = pc4Bytes[iCount - i - 1]; # endif return tInt.uint32; }
Haven't really watched the video yet, but be careful that in c++ ( don't know about c ) a lot of the ways you use a union is UB. When you assign a value to one of its fields, you should only use that field.
I have a question :) What is the difference/similarity between a struct in C versus a hash-table or maybe even 'object'?? I'm more of a 'head learning' place rathe rthan actually programming. From what I have in my head is that the struct may be like a hash-table or object from a language like C compared to C++ or using a scripting language like PowerShell in Windows Thank you
A struct is like a class (object) without methods (member functions in C++ lingo) . In C++ struct and class are almost the same. In a hash table lookup is happening at runtime, while for a struct the compiler knows exactly which member variable is located where in memory. You certainly can use structs to build a hash table.
Could you use this to store a large datapacket and acessing each byte seperatly? Something like this: union{ uint32_t data; struct{ char 1; char 2; char 3; char 4; } }
Yes, sort-of. You can easily run into endianness issues if you're not careful, and make sure you pack the struct to keep your compiler from inserting padding between the members. But, otherwise, yes this will work.
In this case, an alternative would be to use a void pointer as void* versionOrPersonality. Cast it to the correct type if the isRobot flag is set to true. I recall that we were taught to almost never use unions as in today's scenario, they are considered bad code. A place where I've seen it is to extract bits from a number.
Not that I have a lot of experience but I am yet to see production code that uses unions. And that example shows once again why one should switch to C++: Just have a factory class Character with two children Person and Robot.
main() { char *p = 's'; printf("%s",p); } It's printing nothing. As soon I changes format specifier to %c. It works. Can anyone tell me the logic behind this.
's' is a char (a usually 1-byte integer). "s" is a char* (contains the address of a block of memory containing the 's' character followed by a null character. You're assigning p (a pointer) to be that integer value (rather than an address). With "%c", you're telling printf to interpret p as a character. So, the code is odd, but things do work, because both lines are forcing p to be used as a char. With %s, you're telling printf to treat it like a pointer to some characters, and since you didn't set up p that way, you could get a number of different outcomes (most likely no output, garbage, or a seg fault), depending on what is stored at address 115.
No. In the code given when the char "personality" of hanssolo is assigned it is assigned[1] as a pointer to the constant string (and the version_type contains this address as an int), but the char "personality" of r2d2 contains the value 42 - a very likely segmentation fault if tried to be accessed) [1] at least it was when I did most of my C programming back in the '80s and '90s.
@@Hellohiq10 Even for c. There's a project where an OS is being built with rust so the language is pretty capable. Plus, it's time to have a modern language for embedded systems. C is ancient and should just die.
In C++ you don't... you use std::variant. (unions in C++ just have to many corner cases with undefined behavior). So it is better to limit your explanation to "C" only
@@homelikebrick42 That's my point exactly! Names are hard, but without using a name for the functionality being demonstrated makes it difficult to remember the concept (for people like me at least :^). Categorization is important.
Unions are a case where I believe the verbosity of typing "union foo" (vs hiding behind a typedef) is a feature, not a bug.
When you use a union, you are deliberately making the decision to refer to one memory location by multiple names. That's unusual enough that I want the "union" keyword front and center to draw a reader's attention.
This is good for more complicated unions, but probably not neccessary for the simple ones.
I'm making a C api for a popular service and there are a lot of potential responses that all need a separate struct to retain the data. There are also many different enums, and removing all the typedefs for structs and enums significantly improved the readability of everything. I know it's a bit verbose, but for larger projects, I definitely recommend not typedeffing structs and unions
I used unions in a Sega Master System emulator I'd started working on: the Zilog Z80 microprocessor has 8-bit registers that can pair up and act as though they were 16-bit ones. With the help of unions and structs, I could manipulate such registers both ways without the need for bitwise operations and further logic. Quite useful and clever 😁😁
I love how whenever I stumble across a question about C/C++ essentials, you're posting a video about that exact thing.
Yeah, inter-language bindings has been on my topic list for a while. Just need to get around to making the videos. 😀
@@JacobSorber hello sir your way of explanation is better than paid classes and I am really sorry to compare with them ..
I had a small doubt in union topic when I execute the below code
#include
#include
#pragma pack(1)
typedef union
{
uint32_t* p;
uint8_t* q;
} somepointer;
int main(int argc, char** argv)
{
uint32_t r;
uint8_t s;
somepointer p;
r = 10; s = 15;
p.p = &r;
p.q = &s;
printf("%d %d
", *p.p, *(p.q));
}
I am getting answer 2575 15 but i expected both 15 as output what is difference when we use pointer member in union how the memory is handled .
union members p and q only store the memory location of a pointer that is just 4 bytes. In your main program you set the p field of the somepointer and then you set the q field of the somepointer that overrides the memory location in p field since p and q use the same memory location. That's why you got different result from p location. @@ramakrishna4092
@@ramakrishna4092 this is because now you have the pointer to s into the union somepointer, and s points to a single byte. When you dereference p.p, you are dereferencing a pointer to a FOUR byte type, so the cpu fetches not only the single byte that the pointer in somepointer is pointing to, but also the next three bytes after wherever s is in the stack, thus returning a non-sensical value.
Unions are awesome when you're making a programming language with dynamically typed variables. It's nice to have a struct with the value type information and an union storing the actual value.
I'm working on a university project to make a python assembler in C. The teachers invented a .pys filetype that works as a middle step between .py and .pyc and we are programming the assembler that converts .pys in .pyc in C. We use unions exactly for what you've described.
AFAIR it's done exactly this way in PHP. It's internally called zvalue.
Thanks for this. One of the C books from back in the day (©1988) concluded a very short chapter introducing unions with the unhelpful statement: "The author hasn't yet found a reason for using union structures [sic] instead of solving the problem with other C data types."
a neat thing about unions is you can use them to serialize a float to send it down some serial communication like uart , a float element and an uin32_t element and thats it. then send the uint32_t byte by byte and assemble on the other end back to a float, via pointer or another of the same union. serializaton like this is also useful with structs to send them along their serial way, you don't need a union to do that but it's a much cleaner way than with pointers
iirc there's some dumb obscurities in the language that says type-punning with unions is implementation-defined behavior or something like that. The most correct way is to use memcpy to bit-cast (the compiler is smart enough to optimize it) or, assuming your compiler supports it, use __builtin_bit_cast / std::bit_cast.
@@Minty_Meeomemcpy on a tiny microcontroller? Implementation is known as you know what cpu you are running on. His method is definitely useful and used.
I have been using Linux for over 15 years and struggling to learn C on my own by reading manpages and trying to decipher source code. I have learned more in a few hours on your channel than all those years combined. Thank you for what you do.
In Soviet Russia we share the memory comrade no structs or classes allowed
Soviets have collapsed, now the world is doing it the USA way.
A perfect communist utopia shouldn't have any classes
But structs are data. At least in C.
Missed opportunity to call it Soviet Union
Great Soviet, long live Soviet!
I usually use a union for a UART receive buffer . Then you can use the data type as a uint8_t or a “char” which also uses 8 bits .
Thanks. Explained it well enough for me. Going over structs in a course right now, and the instructor just flat-out says "don't use unions. They are dangerous." What you gave is a perfect use case of when I should consider them.
As with all things low-level programming, they are only dangerous in the hands of a rookie, but powerful in the hands of someone who knows what they are doing. Much appreciated.
I absolutely love how you explain all the concepts with a little hands-on video of their implementations.
I have been waiting for this video. Thanks for talking about unions.
You are my top source of C lessons. Thank you for your efforts.
Yeah, I knew a few of you were waiting for it. Glad I could help.
Bold of you to discuss unions without mentioning hardware peripherals where the same configuration space will have different register meanings depending on the version or mode of the peripheral being configured. That's like 95% of the usage for unions.
Very useful with matrices and vectors. Have a union with an anonymous struct inside holding the individual elements, and an array with the size of the number of elements in the top level union. Then you can change/get the values of the elements by their names (like x, y, z etc) or iterate over them using the array.
Unions seem like one of those things that aren't generally useful, and very error prone, but would have some very good niche cases. Could be useful for something like fast inverse square root, or some case where type punning is useful. Coming from C++, polymorphism would be a lot less error prone I would think for the use case shown here though.
I've been programming C++ for like 3 years now and this is the clearest way someone has explained this. Most people just handwave the question and just say it's "some old thing from C, don't worry about it."
This channel is a literal goldmine for someone learning C and C++. Thank you so much!!
ruclips.net/p/PLlrATfBNZ98dudnM48yfGUldqGD0S4FFb
@@xeridea you are a god send!!! I needed this since my programs difficulty curve just shot into space! Thank you so much man!!!!
Union is very useful for networking, because sometimes we expect structured data (struct) but has to be bytes before being sent. Thus, it would be struct in an union.
Very informative and excellent example.
I really got a more profound understanding that some tools aren't supposed to work very efficiently out of the box but rather it give higher advantages when used in combination with other stuff; like, structs in this example.
C unions are very powerful, they basically allow you to implement pattern matching and inheritance without advanced language constructs
How
@@zeinfeimrelduulthaarn7028 look it up
@@marusdod3685 yes sorry that was a stupid question aha, ty for answering, i'd have ignored myself if i were you
have union and enum in a struct. use enum to indicate which variable of union is valid. use switch to do operations on a struct depending on what you need. Simpler than OO, right?
I understand the concept and its potential application very quickly. Your explain style and immediate example is just top-notch. Nice.
You explained a lot about the advantage of saving memory, but glossed over the fact that you can stomp on values already saved. Would like to know more about the cons/pitfalls when using these.
I use unions to reinterpret data from different types, or extract individual bytes from an integer, or combine multiple bytes into an integer, etc.
I've used unions a lot. Any time you have a record format with a variable area, you can union together several structs to define the area.
You can also union a struct with a buffer. Read the record into the buffer name, and refer to all the fields by the struct names.
Hey, do you mind posting any examples in code for those mentioned applications?
I use unions to get at byte values for floating point numbers for rounding, un-signing, etc.
Also the different values in a union can themselves be structs which makes for some pretty powerful either/or representations.
I've not seen an unnamed union in a struct before. Nice trick.
Also unions can be used to serialized a struct, which is handy on a micro pricespy when sending data over a serial link.
I use something like this for handling the registers of an embedded device:
struct {
// stuff...
union {
struct {
uint16_t foo;
uint16_t bar;
uint16_t baz;
// etc...
};
uint16_t regs[REGISTER_COUNT];
};
} the_device;
The communication protocol requires the registers to be identified by numbers (hence the array), but the source code is more readable if I can write
the_device.foo
rather than
the_device.regs[INDEX_OF_FOO]
the example was amazing, thank you!
The final example was brilliant.
And yes I'm a huge Star Wars fan!
Oh lord. I've declared a union and it worked. Nice! I'm gonna be breaking my compiler mixing them with templates :)
I recently wrote a (pretty bad :D) JSON parser as a training exercise and was wondering how to best implement the dynamic typing... Ended up just using void pointers and lots of casting... Looks like unions would've been the proper way to do it. Good to know!
Hi, thank you very much for your videos. All the videos I've seen from your channel are pure gold. Thank you for so much, sorry for so little. Greetings.
I only learned recently the hard way and very quickly that using a union is the equivalent of typecasting the variable, not the literal being assigned to it, i.e., ((type) variable) = value, not variable = (type) value. The reason I found out was because I was writing as uint32_t and then reading as uint64_t. In debug, I noticed the low 32 bits had the correct value, but not the high 32 bits. Sure enough, it turns out I had mistakenly believed when I first was introduced to C that it had the same behavior as casting the assignment value. This differs in that, at least for x86, casting a uin32_t to uint64_t would generate a movzx (move with zero extend) instruction, whereas with a union instead, we just get a movd instruction (move dword).
It was for creating a simple output parameter that gives two status codes: general_cause, and specific_cause. This would permit the user of the procedure to determine what should be done with the output based on the status codes given, in this case, for an HTTP API that I implemented using WinHTTP. The solution was to set the values to zero at the start of each procedure that provides these output parameters.
Certainly an improvement over exceptions to use this simple method, but unfortunately, C doesn't have a nice way of clearly separating input parameters from output parameters, and polluting the return values with interprocedure state metadata is absolutely unacceptable as the return value should return the result of its execution--things that are inherently void type shouldn't return a bool or otherwise for information about operation success or failure because that is accidental to the procedure and makes the procedure interface ambiguous in the best case, and misleading in the worst case. It's incredibly cringe to see if (!WinHttpProcedureThatReturnsNothing(...)) { fatalf(...); } as opposed to WinHttpProcedure(..., &status); if (status.general_cause == HTTP_FAILURE) { fatalf(...); }.
I'm using those unions when I'm working in embedded systems and, for example, I need to transmit an uint64_t variable via infrared LED to an A/C. But this uint64_t contains multiple commands and parameters inside. With union you can use only one uint64_t space memory and divide it in multiple uint8_t, uint16_t, uint32_t variables or even define number of bits for each member inside of this variable. Each member it is a command or a parameter for A/C.
Actually you are sending a set of: mode, temperature, fan speed, etc in one uint64_t variable.
I have used unions like for this in the past. Infact major vendors like Texas instruments use this technique in the hardware abstraction library. But using this in the networking stack is a bad idea. This technique relies on implementation defined behavior which makes your code less portable. Depending on your business priorities, it might be ok. But in the long run, it causes huge issues. On of my previous codebases made heavy use of this technique. Due to cost and availability issues, we had to change our microcontroller and we ended up having to rewrite most of our firmware.
Total newb, but I know what unions are for and it honestly blows my mind how clever it is. The common understanding I have seen in videos is that saves memory, but you can also intentionally do variable shadowing or have a variable "interpreted as" another type.
I searched for ages to find a tutorial to know how to use it, and when i saw this ... i was in paradice
🌴🌴🌴 Glad I could help.
Writing my own Base64 conversion function was a great place to trot out a union.
The 24 bits of binary expand into 32 bits of printable ASCII, and vice versa...
You gotta be a bit careful about 'endian', but it's a clean solution lacking the usual plethora of 'temp' variables. and oddball loops...
And, on my 32-bit system (for my own use), I sometimes make a union of a (native) double with 2 ints or a pointer and an int, or whatever, allowing functions to return multiple values melded into a single package. Yes, it's a non-portable hack, and one needs to be careful, but it partially overcomes C's aversion to passing structs...
I don't understand why you wrote "typedef" for the struct. Whenever I address a struct I instantiate using its name. Example:
struct foo {
int x;
};
I would just use:
foo myFoo;
And go on with my day and it works fine. I never had to write "struct foo myFoo;". Care to explain?
thank you for making a video on this, youve seen my comment!
edit: btw, shouldnt we use the %zu format specifier when printing a size_t?
The required C MISRA rule 18.4 from 2004 says that unions shall not be used. MISRA C++ 2008 says the same in rule 9-5-1.
In the edition MISRA C edition from 2012 this rule is lowered from "required" to "advisory" (19.2).
Personally, I also try to avoid unions. But there are some situations where a union is the only or the best way. Some examples are shown in the comments.
For your example, Jacob, a C++ programmer would advice to use polymorphic classes. In C, I could imagine to use a struct with pointers to the specific data (personality, firmware). Only one of these pointers should be unequal to NULL. With this, you can also save the boolean flag which distinguishes between robot and human being.
Nevertheless: This is again a great eposiode which helps me to think about data concepts. Thank you!
Thanks.
In Summary - Unions help to save space.
Hi Jacob, great video. If you are looking for video suggestions, one I'd be interested in is why there seems to be such a holy war between using typedef and not using it. It seems like half the people I ask are vehemently against them, and the other half use them all the time. I'm not totally sure I understand the reason against. Thanks
Thanks. I'll see what I can do. The argument is generally about type obfuscation.
Thanks for the video!
Like some people mentioned in the comments, another use case that arose from this shared memory feature is data packing/unpacking in serial communication.
I found a technical article that summarizes it nicely:
"Union in C Language for Packing and Unpacking Data" - by Dr. Steve Arar
Just realized I could use this in project. I had a Header struct and two different node structs that had the header as the first member plus different data arrays that occupy the same space. Now I unioned the data arrays and no longer have to access the header members through object->header.member. One little issue is that I need to know the size of the header to determine how big the data arrays can be and I'm not sure of the best way to do that... I want the struct to always be one memory page in size.
You have such a good way to teach... Do you intend to release any course or something? It would be amazing
Thanks. Short answer. Yes, probably...stay tuned...just too busy for my own good, sometimes.
i used union once when i had to read a CFG file that can store a flag, string or float
its the easiest way to read the once then convert later
Very nicely explained professor, thanks a lot!
awesome explanation, thanks!
Unions are so useful. Its the basic for inheritance. I use it all the time. :)
I just wrote a program in notepad and have not tested it but you will understand the basic of what I am trying to do. enum TYPE
{
RECT, CIR
};
// Child Struture - 1
struct Rectangle
{
// Width & Depth
float b,d;
};
// Child Structure - 2
struct Circle
{
// Radius
float r;
}
// Parent Structure
struct Shape
{
union
{
struct Rectangle R;
struct Circle C;
}
enum TYPE type;
};
// Function to calculate Area
float calculate_area(Struct Shape s)
{
if (type == RECT)
return s.R.b * s.R.d;
else
return 3.14 * s.R.r * s.R.r;
}
int main()
{
struct Shape shp;
shp.type = RECT;
shp.R.b = 10; shp.R.d = 20;
printf("Area = %f
", calculate_area(shp));
}
@@sukivirus Looks great, any reason you have comments littered throughout instead of naming the variables that way?
@@R4ngeR4pidz I agree my bad
Won't virtual functions be much clearer for achieving this instead of unions?
With this approach you need to specify 'type' at compile time only.
@@267praveen C doesnt have virtual functions. If you were talking abt cpp you may aswell just use the oop features for this case anyway…
IMO the unions rule when you handle some control protocol. You have two dozen different commands each with different parameters, so you make a struct command, union parameters, to pass the command around.
I use union in 16-bit mcu when I need to read the MSB or the LSB separately. I just think it is more readable than doing pointer arithmetic:
typedef union {
int word;
char byte[2];
} data16_t;
union can be use for type punning and polymorphism
Finally a useful example of unions, you are the best! Thanks for all these videos.
You're welcome.
Very understandable and useful. Thanks!
Thanks for the video. Throughout my CS degree, unions were kinda skipped over lol, so I had no idea the difference. Only question I have is how it determines which is bigger when given pointers? Does it assume 4 bytes for the pointer or follow the address and use sizeof() internally?
Pointers are always 8 bytes, `sizeof(char*)` == `sizeof(void*)` == `sizeof(struct very_long_struct*)` :)
Pointer size varies by machine. Some microcontrollers have 2-byte or 4-byte pointers. Your typical 64-bit machines (my laptop and probably yours) have 8-byte (64-bit) pointers. And, yes, on a particular machine, the size of a pointer is the same no matter what it points to.
Is your example a typical use case of unions? why not use class inheritance to specialize instead?
I love your channel dude, keep it up!!
Thanks.
It feels like conceptually we could replace the idea of a struct with an interface like in Java or a protocol in Swift, or in some other way via subclasses. Obviously the point of C is to not have classes, but still, just a thought.
@8:37 Shouldn't line 30 be "c.firmware_version" rather then "c->firmware_version" since it is an int and not int* ?
I've used unions so I could do floating-point bithacks relatively easily
thanks so much, Jacob
You're welcome.
Thank you this finally clicked for me!!
Ok so unions are basically if there is a time where only one variable is true out of many then unions are pretty good to use as optimize the code more ^^
Your are great please make more video lectures
Thanks.
Could you maybe avoid endianness issues by template specialization? Is there a predefined trait for this?
One thing I am confused by is what is the difference between accessing a struct property via -> vs dot notation? I've Googled a bit on this and it seems like they are not exactly interchangeable depending on whether or not the property is a true pointer.
You use “->” when you have a pointer: it's either
a_struct.a_member
or
a_pointer_to_struct->a_member
The later is just syntactic sugar for
(*a_pointer_to_struct).a_member
A small note on how the endianness of CPU affects the alignment of elements in the union would be helpful. I believe you might have already covered it in one of the videos, but would be useful here for someone new, since we are already in this context.
FYI, my previous organization's 20+ year old firmware (still evolving) has this common datatype which consists of a structure of union of all datatypes like int, bool, float, string, etc in it. The variable of this complex datatype is thrown around everywhere in a highly complex-interdependent environment, over network and across various platforms, which seems very complex at first but is a lifesaver once you understand it. The actual datatype of this variable, and the value it holds is evaluated only at the source and the intended destination. I think it's like a simple encapsulation.
That sounds interesting can you talk more about it?
a better use case for unions imho is network programming.
It's pretty useful when writing a gameboy emulator because each of the cpu's register can eitther be treated as 1 16-bit reg or 2 8-bit registers.
I wanted to say the same. You can also mix bitfields in with it too (perfect for HW registers), so you have nice setter/getters for the fields while also able to read/write the whole 16/8bit value at once. Makes code a lot easier to read too.
Where have you been my whole undergrad C programming life????
Thanks
If an union has multiple pointers and then we modify the Union will and use those pointers will that cause a segmentation fault
Maybe, maybe not. It depends on what you're doing with those pointers, but the pointers will share the same location, so writing to one and then the second write will overwrite the second. So, if you access the first, you'll get the value you stored for the second. If that first pointer was important, you might have lost important data (and possible memory leak?). But, whether or not it seg faults, depends on what address you stored for the second one, and what you try to do with it.
@@JacobSorber I haven't actually programmed for 20 years or so, I used unions for the flexibility they give you. I was under the impression that although the pointer is concerned, it is the same, the compiler interpretation of it varies in the manner (variable) used in the source code.
Is this behavior still correct?
#define struct union
Anyone could tell me why the size of the struct is 12 bytes, when: int = 4 bytes, float = 4 bytes and char = 1 byte? The other 3 bytes are used for what?
I personally against union, they're fun and powerful but their downside is that the system endianness also matters, meaning that little and big endian would produce different results.
If possible, I'd always prefer bitwise operations over union, and in C++ std::variant also may come handy rather than using union and a type.
Structs vs C++ tuples?
My preference is structs/classes (if necessary) by default as tuples make for a weird mess if not used properly in addition to a bunch of templating madness.
@@SimGunther If possible I'd always pick structs/classes, and in many cases structs/classes are actually faster than tuples.
I can see endianness only matters when using a union for things like joining smaller registers into a larger register, eg
union reg_pair
{
struct
{
int_8 b,c;
}
int_16 bc
};
} mypair;
on a little endian system this would swap the two 8 bit registers being used together, but not on a big endian system.
If you're using the union to merge different data together then endianness isn't going to matter. I used a union to merge the different data-section types in a file: each record was the same length but depending upon a record header the actual data stored differed: read record, check header of record and access the data as appropriately.
I'm starting my computer science major next month, we are gonna be best friends buddy
Alright. Let's do this. 😀
Best of luck.
I use a union, when my program has to convert e.g. 4 char into an int32_t (maybe an ipv4 address), which make it very easy without a bunch of bit fiddling. ;o)
Another good reason, if you need a kind of variant variable type.
Please share a code example as it's seems interesting case
@@267praveen
This is a generic 4 bytes to int converter
#include // To get __LITTLE_ENDIAN
#include // For uint8_t, etc. typedefs
// toInt() bytes to int converter.
typedef union u_char2Int{
char ac4Bytes[4];
uint32_t uint32;
} t_char2Int;
/*******************************************************************************
* Name: toInt
* Purpose: Converts up to 4 bytes to integer.
*******************************************************************************/
int toInt(char* pc4Bytes, int iCount) {
t_char2Int tInt = {0};
for (int i = 0; i < iCount; ++i)
# if __BYTE_ORDER == __LITTLE_ENDIAN
tInt.ac4Bytes[i] = pc4Bytes[i];
# else
tInt.ac4Bytes[i] = pc4Bytes[iCount - i - 1];
# endif
return tInt.uint32;
}
Can you give us some projects to work on to go advanced with c ?
Haven't really watched the video yet, but be careful that in c++ ( don't know about c ) a lot of the ways you use a union is UB.
When you assign a value to one of its fields, you should only use that field.
Another lovely video
I have a question :)
What is the difference/similarity between a struct in C versus a hash-table or maybe even 'object'??
I'm more of a 'head learning' place rathe rthan actually programming. From what I have in my head is that the struct may be like a hash-table or object from a language like C compared to C++ or using a scripting language like PowerShell in Windows
Thank you
A struct is like a class (object) without methods (member functions in C++ lingo) . In C++ struct and class are almost the same. In a hash table lookup is happening at runtime, while for a struct the compiler knows exactly which member variable is located where in memory. You certainly can use structs to build a hash table.
First comment 😊
Thanks for this video...I was hoping to see this discussion... thanks!😊
You're welcome.
Where can I see the content of Makefile?
Around 0:50, I think.
Could you use this to store a large datapacket and acessing each byte seperatly? Something like this:
union{
uint32_t data;
struct{
char 1;
char 2;
char 3;
char 4;
}
}
Yes, sort-of. You can easily run into endianness issues if you're not careful, and make sure you pack the struct to keep your compiler from inserting padding between the members. But, otherwise, yes this will work.
way to less views for so good content!
Why is it not necessary to malloc the char* pointer?
Because string literals are statically allocated.
And it is set to point to the start of the constant string.
I often use unions.
In this case, an alternative would be to use a void pointer as void* versionOrPersonality. Cast it to the correct type if the isRobot flag is set to true. I recall that we were taught to almost never use unions as in today's scenario, they are considered bad code. A place where I've seen it is to extract bits from a number.
Not that I have a lot of experience but I am yet to see production code that uses unions.
And that example shows once again why one should switch to C++: Just have a factory class Character with two children Person and Robot.
main()
{
char *p = 's';
printf("%s",p);
}
It's printing nothing.
As soon I changes format specifier to %c. It works.
Can anyone tell me the logic behind this.
's' is a char (a usually 1-byte integer). "s" is a char* (contains the address of a block of memory containing the 's' character followed by a null character. You're assigning p (a pointer) to be that integer value (rather than an address). With "%c", you're telling printf to interpret p as a character. So, the code is odd, but things do work, because both lines are forcing p to be used as a char. With %s, you're telling printf to treat it like a pointer to some characters, and since you didn't set up p that way, you could get a number of different outcomes (most likely no output, garbage, or a seg fault), depending on what is stored at address 115.
@@JacobSorber Thanks a Ton..
What's the difference between union and a struct? Structs don't stage walkouts 🙂🙃🙂🙃
Hi, thanks. So basically the memory for the "char *personality" is allocated/deallocated automatically?
No.
In the code given when the char "personality" of hanssolo is assigned it is assigned[1] as a pointer to the constant string (and the version_type contains this address as an int), but the char "personality" of r2d2 contains the value 42 - a very likely segmentation fault if tried to be accessed)
[1] at least it was when I did most of my C programming back in the '80s and '90s.
Would be awesome to watch you code in Rust, just saying haha
Nah
@@Hellohiq10 Yah...Rust is the future bruh
@@parlor3115 maybe for c++
Good PJ.. hahaha.
@@Hellohiq10 Even for c. There's a project where an OS is being built with rust so the language is pretty capable. Plus, it's time to have a modern language for embedded systems. C is ancient and should just die.
Can you make a union of structs?
Yes
Sure.
In C++ you don't... you use std::variant. (unions in C++ just have to many corner cases with undefined behavior). So it is better to limit your explanation to "C" only
C unions remind me of Pascal unions.
And Fortran EQUIVALENCE
No mention of tagged unions? en.m.wikipedia.org/wiki/Tagged_union
What he showed in the video is basically a tagged union
@@homelikebrick42 That's my point exactly! Names are hard, but without using a name for the functionality being demonstrated makes it difficult to remember the concept (for people like me at least :^). Categorization is important.
Thanks for filling in the gaps in what I said (and didn't say).
Han Solo is not a robot, but Rick Deckard is.
Use std::variant instead.
That doesn't exist in C
I thought you are gonna write "bool ismale" lol.