As a semiconductor guy I will tell you that chip developers have the same issue software developers do: “My way is better than their way.” As for why the chip makers haven’t just agreed on it, an old German boss once gave me the wisdom of “Don’t ask the pigs which one to slaughter.”
I always found it amusing that the terms Big-Endian and Little-Endian were borrowed from Jonathan Swift's Gulliver's Travels and how to eat a boiled egg.
@@JacobSorber To avoid having to deal with endianness and not zero terminated binary data for sockets in C you could also base64 encode the data using EVP_EncodeBlock in OpenSSL.
I remember trying to work with PNG files on the binary level as well as learning using FILE with bytes rather than text for the first time, and the little endian stuff was happening and it was super frustrating, and I wrote my own function to swap endianness, but the whole time was thinking "why the hell is it like that in the first place"
A fun game is to write a Base64 en-/de-coder on a "littleEndian" machine... (Base64 - transmitting binary data as text. Sender/Receiver each look out for themselves.)
here is why choosing little endian. It is actually correlated with polynomial degree. Base number system is actually a polynomial with digits normalization. By using little endian, the index can also represent as the polynomial degree. example (1234)10 = 4*10^0 + 3*10^1 + 2*10^2 + 1*10^3. from above polynomial representation, we can see that the index is the power of the bases. Thats why little endian is good.
I was always wondering what on earth htonl was. We were given some boilerplate C code (which wasn't explained) as a starter to create an HTTP server for a course, and an invocation of htonl was involved.
On Linux, % man htonl will give you a pretty good explanation of the function. This is true of most of the standard functions. You may need to throw in a "3" to ensure you get the programming version versus a utility: % man 3 htonl
Very helpful. i was confused earlier as to whether endianness also applies to arrays. e.g. Would a "little-endian" OS store a string from higher memory address to lower memory address? But I'm pretty sure the answer is no. Big or Little Endian only applies to the order of data types that span multiple bytes / words, not the order of those data types in a sequence. If I'm wrong someone correct me. Also appreciated the part at the end that showed the benefits of either method.
You are correct - at least for "left to right" languages like English. Text is stored in memory in the same order as it is read. Actually an interesting thought I've never had - how do other languages, like right to left, get stored....
@@edwardmacnab354 if I had to take a guess, I would say each column is stored as an array from low to high memory, but the on screen presentation produces the top down format. That would be my naive approach to it anyway.
@@edwardmacnab354 this is a really late reply, but it is considered wise to avoid storing data as top to down because hardware prefetchers like to sequentially access memory. It might not matter for small datastructures but for ones that are larger than the cpu cache it does.
I've dealt with endianness of floats too. One format (USNGA binary geoid) has no header, and each line (a parallel of latitude) begins and ends with 4-byte integers telling the number of bytes of 4-byte floats between them. If the number of bytes of floats were 65792, my program could get confused. In practice, the number of floats in a line is always a 5-smooth number like 21600.
So obviously the bits run in a BigEndian type of order but the group of 8 bits or the bytes are either in Big or in Little Endian order because as can be seen , the per byte HEX values do not reverse but always stay the same
In case we depend on the Big Endian for the network, maybe we can check the machine Endian then if it's compatible then we don't need to use conversion tools, Otherwise, we will use it, is that make sense?
Sure. That would work. Though, it's probably best to implement this in the conversion tools/libraries (a lot of them already do stuff like this). It usually keeps things simpler in your code.
Things like htonl are likely conditionally compiled on each architecture that they run. You won't, for example, take some code compiled for x86 and run it on an ARM processor, so you'll have two binaries with the same higher level code compiled for the lower level nuances of the machine it is running on. If compiled on a big endian system, htonl could simply return it's input value. If compiled on a little endian system it would swap the bytes around and then return the value. This way you, as the software writer, don't really need to think too hard about it. Just call it and work with the output assuming it has been "taken care of", and then your code is fully portable. If you would need to *ensure* that it had been taken care of correctly, then you would write a test suite to check.
When saving you store a block of memory to, for example, to a file on disk. In raw binary, the same as it is stored in memory In that case the order of a multi byte unit (for example an integer) must be known. When the file is ment to be read on other machines, with possible other byte order, the byte order of that multi byte unit _must_ be described in some file format description. The software reading the file can act on this and convert the unit to the machines byte order.
@@JacobSorber Sorry, but still am not clear. A char's ascii value is within a byte. But, a string/text is an array of characters, right. If a little-endian machine sends the text - "hello" in an IP packet's payload and if the receiving machine is big-endian machine, then, will it not interpret it as "olleh" ? Am I missing some basic understanding ?. Should I do a hton kinda operation on the IP payload ?
@@maxaafbackname5562 Depends on the encoding. For UTF8, the standard is designed so the encoding is always the same. A character may get encoded as multiple bytes, but those bytes will always be in the same order, defined by the standard. For UTF16, there are actually two standards, UTF16LE and UTF16BE, which mean exactly what you think: Little Endian and Big Endian. You're supposed to tell the difference by starting a UTF16 stream with a special character called the BYTE ORDER MARK, and the way that mark gets encoded(either FEFF or FFFE) will indicate whether or not the rest of the UTF16 byte stream is little- or big-endian.
@@JohnHollowell Yes, but with the Unicode version of the joke, all but the world's most intense text aficionados would probably miss the punchline. They would die laughing, though.
OMG! "Little-Endianess" ONLY made sense when we were trying to access sixteen bit memory addresses from CPUs that had only eight-bit accumulators. But some chip makers (*intel*) still have their heads stuck in the eighties. The ridiculousness of this is illustrated by the fact the terms come from Gulliver's Travels where the Lilliputians went to war with neighboring Blefuscu over the proper end of a soft boiled egg should face up in the egg cup. It's like which side of the road to drive on. No good reason, it's just the way they decided to do it when it made sense that traffic going one way should drive on one side and traffic going the other way should drive on the other side. Some countries have actually switched sides at one time or another in history. Oh, after the standard is established you can always contrive justifications for why one way is better than the other, but it really doesn't make any difference. I thought when we moved to sixteen and thirty-two bit computing with the 680x0 chips that we could put this issue behind us.
As a semiconductor guy I will tell you that chip developers have the same issue software developers do: “My way is better than their way.” As for why the chip makers haven’t just agreed on it, an old German boss once gave me the wisdom of “Don’t ask the pigs which one to slaughter.”
Well said. Thanks.
just getting into network programming, thanks so much for clearing out about htons and other weird named function.
A related topic to Endianness is bit flags -- which can change based on the compiler / byte-layout.
I always found it amusing that the terms Big-Endian and Little-Endian were borrowed from Jonathan Swift's Gulliver's Travels and how to eat a boiled egg.
one of the best youtube channels :D
deep and concise. thx
Please make videos for socket with openssl library!
I'll add it to the list.
@@JacobSorber To avoid having to deal with endianness and not zero terminated binary data for sockets in C you could also base64 encode the data using EVP_EncodeBlock in OpenSSL.
@@JacobSorber Personally, given a choice, I'd prefer to learn about LibreSSL over OpenSSL.
I remember trying to work with PNG files on the binary level as well as learning using FILE with bytes rather than text for the first time, and the little endian stuff was happening and it was super frustrating, and I wrote my own function to swap endianness, but the whole time was thinking "why the hell is it like that in the first place"
A fun game is to write a Base64 en-/de-coder on a "littleEndian" machine...
(Base64 - transmitting binary data as text. Sender/Receiver each look out for themselves.)
here is why choosing little endian. It is actually correlated with polynomial degree. Base number system is actually a polynomial with digits normalization.
By using little endian, the index can also represent as the polynomial degree.
example
(1234)10 = 4*10^0 + 3*10^1 + 2*10^2 + 1*10^3.
from above polynomial representation, we can see that the index is the power of the bases.
Thats why little endian is good.
thanks for the tutorial. subscribed!
I was always wondering what on earth htonl was. We were given some boilerplate C code (which wasn't explained) as a starter to create an HTTP server for a course, and an invocation of htonl was involved.
On Linux,
% man htonl
will give you a pretty good explanation of the function. This is true of most of the standard functions. You may need to throw in a "3" to ensure you get the programming version versus a utility:
% man 3 htonl
Very helpful. i was confused earlier as to whether endianness also applies to arrays. e.g. Would a "little-endian" OS store a string from higher memory address to lower memory address? But I'm pretty sure the answer is no. Big or Little Endian only applies to the order of data types that span multiple bytes / words, not the order of those data types in a sequence. If I'm wrong someone correct me.
Also appreciated the part at the end that showed the benefits of either method.
You are correct - at least for "left to right" languages like English. Text is stored in memory in the same order as it is read.
Actually an interesting thought I've never had - how do other languages, like right to left, get stored....
@@TomStorey96 how about top to bottom columns like Mandarin ?
@@edwardmacnab354 if I had to take a guess, I would say each column is stored as an array from low to high memory, but the on screen presentation produces the top down format. That would be my naive approach to it anyway.
@@edwardmacnab354 this is a really late reply, but it is considered wise to avoid storing data as top to down because hardware prefetchers like to sequentially access memory. It might not matter for small datastructures but for ones that are larger than the cpu cache it does.
I've dealt with endianness of floats too. One format (USNGA binary geoid) has no header, and each line (a parallel of latitude) begins and ends with 4-byte integers telling the number of bytes of 4-byte floats between them. If the number of bytes of floats were 65792, my program could get confused. In practice, the number of floats in a line is always a 5-smooth number like 21600.
So obviously the bits run in a BigEndian type of order but the group of 8 bits or the bytes are either in Big or in Little Endian order because as can be seen , the per byte HEX values do not reverse but always stay the same
Not to add too much confusion - there are actually other variants even (luckily mostly past tense) en.wikipedia.org/wiki/Endianness
In case we depend on the Big Endian for the network, maybe we can check the machine Endian then if it's compatible then we don't need to use conversion tools,
Otherwise, we will use it, is that make sense?
Sure. That would work. Though, it's probably best to implement this in the conversion tools/libraries (a lot of them already do stuff like this). It usually keeps things simpler in your code.
Things like htonl are likely conditionally compiled on each architecture that they run. You won't, for example, take some code compiled for x86 and run it on an ARM processor, so you'll have two binaries with the same higher level code compiled for the lower level nuances of the machine it is running on.
If compiled on a big endian system, htonl could simply return it's input value.
If compiled on a little endian system it would swap the bytes around and then return the value.
This way you, as the software writer, don't really need to think too hard about it. Just call it and work with the output assuming it has been "taken care of", and then your code is fully portable.
If you would need to *ensure* that it had been taken care of correctly, then you would write a test suite to check.
FEED BEEF!
DEC's PDP-10 was bigendian while the PDP-11 and VAX were littleendian.
Intel: "Open other end"... 😃
So if I serialize data on a machine the byte order matters? I'm still a little confused as to what he meant by "save" data.
When saving you store a block of memory to, for example, to a file on disk.
In raw binary, the same as it is stored in memory
In that case the order of a multi byte unit (for example an integer) must be known.
When the file is ment to be read on other machines, with possible other byte order, the byte order of that multi byte unit _must_ be described in some file format description.
The software reading the file can act on this and convert the unit to the machines byte order.
5:56 who isn't already subscribed? It's so easy to do! :)
I know, right?
I'm disappointed that Gulliver's Travels and the war between Lilliput and Blefuscu was not mentioned in your explanation.
Ah, true. Definitely a missed opportunity.
There should be high-heeled and low-heeled processors.
What I don’t get here is that text is binary too. Just encoded. So why doesn’t text need to worry about this?
Endianness typically only comes into play with multi-byte values (ints, shorts, longs,...).
@@JacobSorber Sorry, but still am not clear. A char's ascii value is within a byte. But, a string/text is an array of characters, right. If a little-endian machine sends the text - "hello" in an IP packet's payload and if the receiving machine is big-endian machine, then, will it not interpret it as "olleh" ? Am I missing some basic understanding ?. Should I do a hton kinda operation on the IP payload ?
@@xyzxyzspoon no it won't. Because each character is only a single byte value.
What about multibyte characters?
@@maxaafbackname5562 Depends on the encoding.
For UTF8, the standard is designed so the encoding is always the same. A character may get encoded as multiple bytes, but those bytes will always be in the same order, defined by the standard.
For UTF16, there are actually two standards, UTF16LE and UTF16BE, which mean exactly what you think: Little Endian and Big Endian.
You're supposed to tell the difference by starting a UTF16 stream with a special character called the BYTE ORDER MARK, and the way that mark gets encoded(either FEFF or FFFE) will indicate whether or not the rest of the UTF16 byte stream is little- or big-endian.
An endian video with no mention of Unicode and its various endian representations? Okay, I'll bite (no pun intended) ... why?
I'd make an endianness joke by purposely misspelling my comment but sizeof(char) is 1 so it doesn't work. :/
just say it's using unicode characters
@@JohnHollowell Yes, but with the Unicode version of the joke, all but the world's most intense text aficionados would probably miss the punchline. They would die laughing, though.
hahaha xD this sound like LGBT thingy xD haha when i hear BI one
Please add eloop and libevent API
OMG! "Little-Endianess" ONLY made sense when we were trying to access sixteen bit memory addresses from CPUs that had only eight-bit accumulators.
But some chip makers (*intel*) still have their heads stuck in the eighties.
The ridiculousness of this is illustrated by the fact the terms come from Gulliver's Travels where the Lilliputians went to war with neighboring Blefuscu over the proper end of a soft boiled egg should face up in the egg cup.
It's like which side of the road to drive on. No good reason, it's just the way they decided to do it when it made sense that traffic going one way should drive on one side and traffic going the other way should drive on the other side. Some countries have actually switched sides at one time or another in history.
Oh, after the standard is established you can always contrive justifications for why one way is better than the other, but it really doesn't make any difference.
I thought when we moved to sixteen and thirty-two bit computing with the 680x0 chips that we could put this issue behind us.
little endian is how you and computers do addition. if you think its backwards then write your numbers backwards instead.