I would have expected / been tempted to switch the defaults here, where no padding happens by default but you can turn it on as an optimization if you like. I could imagine wanting to serialize/deserialize structs to files and/or send them around; The fact that this is compiler/machine specific and also that the padding has random parts is unexpected and kind of scary. So for example writing structs as bytes to files is non-deterministic, machine specific operation? Resulting in different sha hashes for the exact same data? Yikes
Yeah, the first time I ran into a bug caused by it with another student, we spent a whole afternoon scratching our heads. Least intuitive thing ever but I trust the compilers masters to know what their doing. I should maybe look into how other languages like Rust or Go do that.. 🤔
I agree it’s terrifying, but to play devil’s advocate… the compiler is there to translate the symbolic representation of a program into a CPU-specific sequence of data, right? I wouldn’t expect the compiled binary to hash identically across platforms, I’d expect the source code/another protocol to serve that purpose. It feels fair that the same terms and conditions should apply even for a little one-struct program. Otherwise, wouldn’t the spooky quirks/platform affordances have to be figured out somewhere between the binary on disk and the loaded process in memory? That would be tough to debug.
@@AndrejKarpathy Unpacked should definitely be the default here. Retrieval from cache/RAM just works faster when loaded from aligned memory addresses. Even if the struct binary layout was standardized, it wouldn't guarantee portability because of endianness, while the C implementations' ability to suit each architecture's memory quirks would be lost. By the way, don't tell anyone, but practically all modern compilers will follow the SYS V ABI (struct alignment requirements summed up in agner's optimizing cpp guide Ch 7.20), so you can reasonably assume that on mainstream architectures if you want to write bad code. But otherwise, this is why things like protobuf exist. Thank you for your videos btw! Definitely the best advanced neural net instructor
@@AndrejKarpathy Interesting point! As with many things, it's all about trade-offs. In certain cases, one might prioritize consistency in SHA hashes over performance. However, in performance-critical code where time is scarce but memory is abundant, it makes sense (for example) for underlying ML frameworks to prioritize executing tasks in x time rather than 2x (if it's about instructions). At least from a shareholder's perspective (or python developer, lol).
Struct members are padded to the largest member, 'alignment'. Because the largest member in your struct was 4 bytes, the 1 byte member receives 3 bytes padding and the 2 byte member receives a 2 byte padding, making a total of 12 (4 * 3). If you used an int64_t instead of an int32_t, you will see 8 byte alignment--for a total of 24 bytes (8 * 3). It actually has nothing to do with that 'mod 4 == 0' explanation but has to do with having a single boundary between the members without losing information. And this is simply another way of saying that largest member in the struct. (For instance, make a struct with 3 chars, you will see a sizeof(3), not sizeof(4), because 1 is the largest size and so is the alignment boundary.)
Interesting, I don't quite understand what you mean by "a single boundary between the members without losing information" Quickly reading a stack overflow question (stackoverflow.com/questions/4306186/structure-padding-and-packing), it feels like I said the correct explanation. I would think that accessing members that aren't aligned would be slower.
Omg. As close to Horror as the genre of C instructional videos gets
I would have expected / been tempted to switch the defaults here, where no padding happens by default but you can turn it on as an optimization if you like. I could imagine wanting to serialize/deserialize structs to files and/or send them around; The fact that this is compiler/machine specific and also that the padding has random parts is unexpected and kind of scary. So for example writing structs as bytes to files is non-deterministic, machine specific operation? Resulting in different sha hashes for the exact same data? Yikes
Yeah, the first time I ran into a bug caused by it with another student, we spent a whole afternoon scratching our heads. Least intuitive thing ever but I trust the compilers masters to know what their doing.
I should maybe look into how other languages like Rust or Go do that.. 🤔
I agree it’s terrifying, but to play devil’s advocate… the compiler is there to translate the symbolic representation of a program into a CPU-specific sequence of data, right? I wouldn’t expect the compiled binary to hash identically across platforms, I’d expect the source code/another protocol to serve that purpose. It feels fair that the same terms and conditions should apply even for a little one-struct program. Otherwise, wouldn’t the spooky quirks/platform affordances have to be figured out somewhere between the binary on disk and the loaded process in memory? That would be tough to debug.
@@AndrejKarpathy Unpacked should definitely be the default here. Retrieval from cache/RAM just works faster when loaded from aligned memory addresses. Even if the struct binary layout was standardized, it wouldn't guarantee portability because of endianness, while the C implementations' ability to suit each architecture's memory quirks would be lost.
By the way, don't tell anyone, but practically all modern compilers will follow the SYS V ABI (struct alignment requirements summed up in agner's optimizing cpp guide Ch 7.20), so you can reasonably assume that on mainstream architectures if you want to write bad code. But otherwise, this is why things like protobuf exist.
Thank you for your videos btw! Definitely the best advanced neural net instructor
@@AndrejKarpathy Interesting point! As with many things, it's all about trade-offs. In certain cases, one might prioritize consistency in SHA hashes over performance. However, in performance-critical code where time is scarce but memory is abundant, it makes sense (for example) for underlying ML frameworks to prioritize executing tasks in x time rather than 2x (if it's about instructions). At least from a shareholder's perspective (or python developer, lol).
taking the opportunity of this power of two episode, thanks for making this series ^^
ahahah, that means you're thanks are going to get exponentially rarer
Struct members are padded to the largest member, 'alignment'. Because the largest member in your struct was 4 bytes, the 1 byte member receives 3 bytes padding and the 2 byte member receives a 2 byte padding, making a total of 12 (4 * 3). If you used an int64_t instead of an int32_t, you will see 8 byte alignment--for a total of 24 bytes (8 * 3). It actually has nothing to do with that 'mod 4 == 0' explanation but has to do with having a single boundary between the members without losing information. And this is simply another way of saying that largest member in the struct. (For instance, make a struct with 3 chars, you will see a sizeof(3), not sizeof(4), because 1 is the largest size and so is the alignment boundary.)
Interesting, I don't quite understand what you mean by "a single boundary between the members without losing information"
Quickly reading a stack overflow question (stackoverflow.com/questions/4306186/structure-padding-and-packing), it feels like I said the correct explanation. I would think that accessing members that aren't aligned would be slower.
Great stuff never knew this!
Thank you
Thanks !!!
Thanks
Thanks man
I heard that certain system will crash if memory is not aligned, so in the best case it's slower worse case it crashes lol
Oh, good to know thanks