Mike Pound is by far my favorite person on this channel... he has the most interesting subjects, shines with crazy knowledge while still keeping the video fresh and dynamic.
Been watching a whole bunch of Mike's videos as a complement to my introductory module on Security and Authentication. One of the best teachers I have come across!
I've been trying to understand the concept for 3 days from the slides my teacher covered and the book she shared and ended up with complicated mind, this video gave me a pure understanding in 10 mins. Great job!
I am at a hackathon in Chicago Illinois at Illinois Institute of technology and I have to use sha-1 on some facts before I pass then to an api so I can make a project for the Hackathon. You did a wonderful job telling me what she-1 was so I could understand the cryptic api documentation. Thank you very much.
Hmm, so far this is fairly straightforward, but the interesting part would be how exactly these compression functions work. Will there be a follow-up video on that?
In essence, it generates 80 32 bit words derived from bits of the plaintext, then the state does right circular shifts, some XORs, some bitwise ANDs, addition with the round word and round constant, and then permutation between all state variables
Thought I was following until 9:35 He describes a way of padding that will produce the same padding string for messages with the same length - then says it's important that messages with the same length don't have the same padding string. Did something important end up on the editing room floor?
I'll check with Mike but I think it was just a slip of the tongue - ie The padding would be the same for messages of the same length but the messages would be different if they are different >Sean
aullik Considering almost all real-world data is stored as a stream of bytes (8 bit values), That's incredibly unlikely to ever come up. It could be 504 bits, but 511 is highly improbable. If your padding has to add at least 8 bits (one byte), then the thing he described works fine. Remember working with individual bits is almost unheard of in computing. If you have to store individual bits for storage efficiency, you pack them into bytes. (similarly, if you store 7 bit values, you either store them in 8 bits and ignore a bit, or you pack it such that you store, say, 56 bit blocks. (7 x 8 - eg, 8 sets of 7 bits stored in 7 bytes)
aullik: Exactly the question that raised to my mind too :-) Since there isn't necessary enough bits left in the block to include the length of actual message.
+KuraIthys Going with bytes, the longest message that could still be padded would be 496 bits long. 504 wouldn't work as you'd only have 8 bits left but 504 in binary is already 9 bits long.
+Kuralthys I know that we usually work with bytes, But even if we say we have 512-8 = 504 bits Then we add 1 '1' bit to start the padding and now we only have 7 bytes left. The message is 504 bytes long but we can only store 128 in 7 bits. The only answer is that we expand to 1024 bits. But the question would be how do we expand. What is the "syntax" for the lack of a better word
What I want to know, for no particular reason, is if there are cases where a hash of a hash equals itself, of course sticking with one particular algorithm and hash length.
3:17 And the reasons why the NSA came out with SHA-1 to replace the earlier SHA-0 (or just plain “SHA”) were not revealed publicly. But the weaknesses in the original SHA were discovered independently a few years later. This was part of a sequence of evidence indicating that the gap between public, unclassified crypto technology and what the NSA has was narrowing, and may not be significant any more.
I think it's widening because look at Pegasus and with Pegasus 2.0 you only need phone number to target a victim. And, Pegasus is joint project between Israel and USA. Imagine what NSA would have kept to themselves. It is common understanding in computer security feild that if government wants you, they have you.
Thanks, Dr Pound (if you read this). I find your demeanour easy to engage with, and you set me off on the journey of understanding fully (with much work!).
Mike you are my favourite person to appear on this channel. I enjoy your clear explanations and like the quite recent toppics like google deep dream, dijkstra and so on.
The compression function of SHA is where it gets quite complicated, and I don't think it would've fit into the scope of one video, as explaining it to someone with no prior knowledge isn't trivial, there's quite a bit of complicated math involved, and very few people actually understand the details of it.
Hacking: The Art of Exploitation is a great book by Jon Erickson, which teaches you the basics of reverse engineering, code flow, basic C programming, the stack, networks and other things to get you started on binary exploitation. It's a great book, I recommend it to anyone who's willing to invest time in learning how to hack properly.
cyancoyote Thanks for the reply. I've heard by many people that C is a very hard language to learn though... do you have any recommendations for introductory books to learning assembly?
That 011001011 he wrote down is actually the start of the SHA hash value for "abd". I wonder if that was intentional, because the odds of that happening randomly are less than one percent.
Me: Explain SHA Dr. Pound: Explains it Me confused: Explain it to me like I'm 12 Dr. Pound: Explains it like I'm 12 Me still: Explain it to me like I'm 5...
How would the padding work if the final block of the message was long enough that you don't have enough padding room to say the number of bit in the message? So if the final block contained 510 bits you would have to pad in 9 bits(111111110) to say that the message is 510 bits, but you would end up with more than 512 bits.
The length field has a fixed size (which is sufficient enough) (also the field is not optional). The length of 10...0 is decided including the size of the length field i.e. you could jump over to the next block if required.
Can you talk about the colliding prefix issue? As I understand it once I find a collision with a file, I can continue to create collisions by appending the same thing to both files, and some how this allows me to create two meaningful files each with the same hash value where one might expect that any collision which might be found would be obviously fake because it would have to be made up of a bunch of random bits.
This was very informatice! Question: Is there any significance to the initialization constants h0 = 0x67452301 h1 = 0xEFCDAB89 h2 = 0x98BADCFE h3 = 0x10325476 h4 = 0xC3D2E1F0 Or are they chosen "randomly"? Thanks!
Since SHA is deterministic, even though it is non-reversible, it is still possible to guess the hashes of some reasonably short messages. For example, string 'abc' ALWAYS produces ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad. If I have a large enough database plus computational power, I could probably guess some short messages, although not the entire novel.
If there's less than 65 bits of space left in the final block for padding, you just pad toward an extra block. For example if your message is 480 bits, you add a one-bit, 479 zero-bits, and the 64-bit length, giving total length 1024 bits = 2 blocks.
1:34 Well, by “completely changed” you mean that somewhere around 50% of the bits in the hash flip to different values. You don’t want them _all_ to flip (or flip according to any discernible pattern), otherwise that’s no longer quite so random.
I kinda want to make my own hashing algorithm now. It wouldn't be very good, it would just be some random jostling around of bits until it looks weird.
Still a bit too confusing for me........ Can you make a video on Hashing VS Encryption? When is what used? If the hash always has less information than the actual file, why would you ever need to hash something in the first place?
Encryption is reverseable, hashing is not. In hashing, the receiver only need confirmation that the data is valid. One example is password authentication. For security reason, the server does not store copy of user password, they only store hash of the password. When a user try to login, the server compare the password hash to the one stored as authentication. Meanwhile, if the database gets breached, people can't use password hash to find out the original password (other than brute-force the original password).
So the padding is only denoted by the last one with a trail of zeroes and a length at the end? That is not a prefix and without some other way of indicating that padding is present it is indistinguishable from data. After a quick google search it appears that the padding is always present so it doesn't need to be a prefix.
I just read on Wikipedia that the block size of SHA1 is 512 bits, but the internal state and output size is 160 bits. So, what's the difference between block size and internal state, and what happens to reduce 512 bits down to 160 bits and why?
The block size just says how much of the message the hash algorithm works on at a time. In this case, if you have a 1024 bit message, SHA-1 works on the first 512 bits first, then the next 512 bits. The internal state is where it "mixes" the input bits. That's five 4-byte words (Mike - and the specification - labels them H0, H1, H2, H3, H4). 5 words x 4 bytes x 8 bits = 160 bits. For SHA-1, it actually first *extends* the 512 bits to 2560 - *eighty* 4-byte words. (It makes those "extension words" by xor'ing different parts of the original 512 bits together). Then it mixes each of the eighty words, one by one, into the state, using XOR's, AND's and bit rotations - also constantly rotating the words in the state. That way, the eighty words end up in five words. That's the compression. The output is just the state when it's done. For some hashes (e.g. SHA-224, SHA-384), output size will be just a part of the final state. For SHA-1, the output is the entire state verbatim.
I don't understand the last bit about padding. If changing a single bit will completely change the hash result, why can't we just fill up the message with 0s up to the next 512 multiple?
9:18 The trailing 1 is added at the next byte. So if you have: 01101010 it will be padded like this: 01101010[100000000000000000000000000000........ 448] + (64 bits of message size) If there isn't enough space for the 64 bit size block the block will be padded to the end and the size will get its own block. So like this: Block 1: 01101010......[1000000000000000000000000000000........ 512] Block 2: [000000000....... 448] + (64 bits of message size) I know this because I made my own implementation of the SHA-1 and SHA-2 algorithms
How do hash functions prevent creating collision free hashes if the functions are not communicating with each other or keeping track of all the hashes ever created?
is that u of Nottingham cup supposed to be some kind of product placement? it's like the camera is trying to keep it in frame and it doesn't even look like it been drank out of. also cool rubix cubes on the shelf
Given the whole of Computerphile is to some extent an endorsement of the University of Nottingham it seems unlikely, or at least unnecessary. More likely it happened to be part of the initial framing shot the camera operator wanted to avoid drifting from too much.
So a hash function can protect against doctoring a message. How do you prevent the insertion or deletion of a message in stream of messages? Each can be hashed, but you could create a new message, hash it, send it and its deemed good. Do you have a secure cryptographic sequence number than can be embedded in any way?
"How do you prevent the insertion or deletion of a message in stream of messages?" Before sha'ing you just append a shared secret. That way someone intercepting the message on route won't be able to produce a valid hash for an altered message. The recipient verifies the integrity of the message by sha'ing the message with the shared secret appended to it. "Do you have a secure cryptographic sequence number than can be embedded in any way?" If you mean some "sequence" number that appears to change randomly from one message to another, yet is known/anticipated by the recipient, than that's basically their shared secret, except it's not static. However, in this scenario getting out of sync would mean that all the following messages would fail their integrity checks, until some sort of reset. That makes it trivial to do a DoS attack on the protocol/exchange. One common way to counter this is to reset every minute or two, but then the communication would have to be (close to) real-time. Such a sequence can be any sufficiently random pseudo-random number generator sequence.
What happens if the last block has, let's say 504 bits, and the last 8 bits does not have enough room to store the length of the message? Wouldn't the padding scheme break down?
What to stop someone from precomputing all of the possible hashes, and saving it to a file that can be read as an array, then doing the same with the things it was hashing being saved to a different file. When someone wants the reverse hash of something, open the file and look up the position of the hash within that file, then look up that same position in the un-hashed file. or is it faster to just generate all possible combinations on-the-fly until finding a hash that matches.
That is actually a possibility, called a rainbow table. One way around it is to use a salt: when a user first creates an account, you generate a random string of characters, append it to the password and then hash it. The random string is stored in your db alongside the hash. This also mkaes it so you have to crack each user's password individually.
What if the length of the message is 511 bits? Then we have only 1 bit of padding, and we can't possibly store the number '511' in 1 bit of information.
I don't think it should, because the end of the padding contains the message length. If you know the message is n bits long, then you know exactly where message ends and where padding begins, no matter what other patterns they might contain. (Right? I think that's right...)
Tx for the video :-). Maybe someone can help me with this question: What does determine the outcoming hash? At the one hand it is totally random, at the other hand it is consistent? Is it a super hugh complex formula, so that it is better to randomly guess instead of solving the formula? Or is it the NSA the only one who has the formula?
I'm curious about the books on the shelf whose titles I can't read. They are the 4th, 6th,7th, and 11th books from the left. I don't think I care so much about the 12th book from the left. Does anyone know the titles of those books? I think I want those books.
If sha1 is not reversible how come git does not screw up your files. I thought the whole point of hashing is to be able to get the original input somehow given some extra information available only to you. Probably I'm jumbilng up encryption and hashing.
you are confusing encryption and hashing. git uses hashes as an identifier for a state while hoping that two hashes of different states will never be equal. (so far, to my knowledge, that never happened.)
A hash is *never* reversible. This is the purpose of a hash - not being able to get the original data back, but have a (somewhat) unique identifier of the original data.
I'm confused , what is that "abcde" stand for ? and why is the loop be done 80 times ? and the text is 512 bits long right ? how do I convert them into H0-H4 which is 160 bits in total ? thanks
Actually that process involves using x-or function ,you can see it on the net about the way the abcde is changed into a different abcde it is pretty interesting
I know youre not 'languagephile' but is there a real reason for nought and zero being so stark in contrast? also: if oyu hve a message between 502 and 511 (inclusive) the padding would try to tack on 10 extra bits, how is that resolved? (10 bits because 1, then #of bits which is 9 in length)
it's not used for encrypting a message. the most common use is storing passwords: you store the hashed password when an account is created. when the user logs in, you hash the password they enter and compare it with the saved hash.
If that's how it works, it is very easy to find collisions: 1. Hash 20 bits long data 2. Copy the 512 long data that have been created (by the rules of padding one followed by zeros plus the size) Then you have two inputs that are essentially the same who share the same output. So I think there is a lot more sense in applying those rules no matter what the size of the input is, and adding 512 bits blocks to the end if needed. I think this is how the SHA works.
That's how it works, yes - it's always padded. Without padding you can easily append whatever you like at the end of a message; an important part of the integrity check is to tell where the message ends. It's not at all easy to find collisions, though. When you hash 20 bits of data you're actually hashing 512 bits of data as the algorithm only works with exactly 512 bits at a time, i.e. one block. The remaining 492 bits must therefor be padded in a consistent way - if you pad it this way and I pad it that way, we'll end up with different 512 bit blocks, which in turn will result in very different hashes. If the message length (in bits) modulus 512 is 447 or less, there's room for the padding which is one 1, followed by however many 0's needed to get to 448 bits. Finally, the 64 bit length of the message is added (which brings it up to exactly 512 bits). If there's not enough room, additional 0's are added in a following block, up until there's only 64 bits left. (If the message length modulus 512 is 0, then the final block will consist of nothing but a 1 followed by 447 0's and then the length.)
Who the heck writes the subtitles for these videos and why are they so badly wrong? They managed to mishead 'SHA' as both 'char' and 'shower' within the space of like a minute. They're not auto-generated... or are they?
There is a bigger number of messages that you can hash than there are possible hashes that you can generate. So yes, you can. But the better the hashing algorithm the less likely such a collision. A hashing algorithm is seen as cracked when someone can look at a hash and find out a message that can generate the exact same hash.
When you think about it it is obvious that there are collisions. The output of the algorithm is fixed length, but the input can be arbitrarily long. That means that there are infinitely many more possible inputs than there are outputs, and therefore some inputs will give the same outputs.
Is there mathematical theory to prove that the resulting hashes are "evenly" distributed in the 2^256 or 2^160 space? If the resulting hashes are somewhat "clustered" in a space that is smaller than perceived 2^256, then the chance of collision would be higher
Mike Pound is by far my favorite person on this channel... he has the most interesting subjects, shines with crazy knowledge while still keeping the video fresh and dynamic.
I like him and his topics too, though the AI topics are interesting and the person explaining them is good too
he has great body language, tries to use it as much as possible
And a fair looker.
And the same accent as the 11th Doctor (Matt Smith)! :-D Where is that accent from?
Absolutely agree, Tom Scott is my second favourite, that guy is hillarious
I could sit and watch videos from this guy all day long, so informative and laid back
wrg
Love how these videos get STRAIGHT to the point.
Been watching a whole bunch of Mike's videos as a complement to my introductory module on Security and Authentication. One of the best teachers I have come across!
This is too much work, can’t we just trust each other?
That ,my friend, is the real problem
How can I trust other people when I can't even trust myself
@Mohamed Seid GodisGood666!
Dont trust verify
No Way!!!
I've been trying to understand the concept for 3 days from the slides my teacher covered and the book she shared and ended up with complicated mind, this video gave me a pure understanding in 10 mins. Great job!
Mike Pound is the best! I love hearing him explain things - keep em coming!
I am at a hackathon in Chicago Illinois at Illinois Institute of technology and I have to use sha-1 on some facts before I pass then to an api so I can make a project for the Hackathon. You did a wonderful job telling me what she-1 was so I could understand the cryptic api documentation. Thank you very much.
This is my favorite guy on this channel. I just love stuff like this.
Hmm, so far this is fairly straightforward, but the interesting part would be how exactly these compression functions work. Will there be a follow-up video on that?
In essence, it generates 80 32 bit words derived from bits of the plaintext, then the state does right circular shifts, some XORs, some bitwise ANDs, addition with the round word and round constant, and then permutation between all state variables
@@liljuan206 thanks, this really helped clearing things up
it isn't compression he is describing it is hashing. which is not what encryption is. which is what sha is. (notice the s part stands for secure).
@@liljuan206 how do they make it so it can't be reversed?
In essence Sha-2 uses 6 primary functions: Choice and Majority, and S0, S1, E0, and E1 all which move and permutate bytes around during compression
Roses are red
Violets are blue
Unexpected { on line 32
coding joke
A poetic compiler? I like that idea
Unresolved external symbol
Felt that on a spiritual level
Violets are blue
Roses are red
Your code isn't thread-safe
Use locks instead
Thought I was following until 9:35
He describes a way of padding that will produce the same padding string for messages with the same length - then says it's important that messages with the same length don't have the same padding string. Did something important end up on the editing room floor?
I'll check with Mike but I think it was just a slip of the tongue - ie The padding would be the same for messages of the same length but the messages would be different if they are different >Sean
No, "0010110" padded would be "0010110100000...", but "001011000" would be "001011000100000...", so the 1 (first bit of padding) would be later.
+Mat2095 He obviously meant if you just pad them with zeros.
How does the padding work if a block is 511 bits long?
aullik Considering almost all real-world data is stored as a stream of bytes (8 bit values), That's incredibly unlikely to ever come up.
It could be 504 bits, but 511 is highly improbable.
If your padding has to add at least 8 bits (one byte), then the thing he described works fine.
Remember working with individual bits is almost unheard of in computing.
If you have to store individual bits for storage efficiency, you pack them into bytes.
(similarly, if you store 7 bit values, you either store them in 8 bits and ignore a bit, or you pack it such that you store, say, 56 bit blocks. (7 x 8 - eg, 8 sets of 7 bits stored in 7 bytes)
aullik: Exactly the question that raised to my mind too :-) Since there isn't necessary enough bits left in the block to include the length of actual message.
You could add another block of 512 bits to the end to make it work.
+KuraIthys
Going with bytes, the longest message that could still be padded would be 496 bits long. 504 wouldn't work as you'd only have 8 bits left but 504 in binary is already 9 bits long.
+Kuralthys
I know that we usually work with bytes, But even if we say we have 512-8 = 504 bits Then we add 1 '1' bit to start the padding and now we only have 7 bytes left. The message is 504 bytes long but we can only store 128 in 7 bits.
The only answer is that we expand to 1024 bits. But the question would be how do we expand. What is the "syntax" for the lack of a better word
Would you please explain the workings of the "washing machine"? ;-) I.e. the compression functions?
Thanks. I'll give this snippet a look. :-)
I love this channel so much...
I've always loved your videos and now I study computer science and can watch your videos for studying, it's amazing
What I want to know, for no particular reason, is if there are cases where a hash of a hash equals itself, of course sticking with one particular algorithm and hash length.
3:17 And the reasons why the NSA came out with SHA-1 to replace the earlier SHA-0 (or just plain “SHA”) were not revealed publicly. But the weaknesses in the original SHA were discovered independently a few years later. This was part of a sequence of evidence indicating that the gap between public, unclassified crypto technology and what the NSA has was narrowing, and may not be significant any more.
I think it's widening because look at Pegasus and with Pegasus 2.0 you only need phone number to target a victim.
And, Pegasus is joint project between Israel and USA. Imagine what NSA would have kept to themselves.
It is common understanding in computer security feild that if government wants you, they have you.
The washing machine example really helped seal in this topic I was trying to understand and helped me on my final project. Thank you!!!
pound for pound Mike pound is the best narrator on computerphile
Thanks, Dr Pound (if you read this). I find your demeanour easy to engage with, and you set me off on the journey of understanding fully (with much work!).
9:40 I didn't quite understand how that padding scheme guarantees that messages with the same size would not share the same padding.
Mike you are my favourite person to appear on this channel. I enjoy your clear explanations and like the quite recent toppics like google deep dream, dijkstra and so on.
My dealer need this.
Appreciate your feed back!
Thanks for watching, for more info and guidance on how to trade and earn.
W…h…a…t…s…A…p…p~~M.E……
+…1…7…2…0…3…1…9…7…5…5…1
😂😂😂😂😂
😆
🤣
You explained everything except for the part that actually matters. :(
You may as well have said, sha works by shaing things.
Exactly my thought :/
That they explain complicated things in an easier to understand manner. Sorta like every other video they make.
Ah, I see now...it's a washing machine with some knobs that does the sha'ing.
The compression function of SHA is where it gets quite complicated, and I don't think it would've fit into the scope of one video, as explaining it to someone with no prior knowledge isn't trivial, there's quite a bit of complicated math involved, and very few people actually understand the details of it.
YES exactly this..
Anyone notice the 'hacking' book on the shelf behind?
It doesn't look like anything to me
Hacking: The Art of Exploitation is a great book by Jon Erickson, which teaches you the basics of reverse engineering, code flow, basic C programming, the stack, networks and other things to get you started on binary exploitation. It's a great book, I recommend it to anyone who's willing to invest time in learning how to hack properly.
lol
cyancoyote is knowledge of a programming language required?
cyancoyote Thanks for the reply. I've heard by many people that C is a very hard language to learn though... do you have any recommendations for introductory books to learning assembly?
Note to self: Don't use a regular monitor as a touch screen
Its a university flatron monitor, probably expendable.
How do you know the "1000000..." padding bits are for padding purposes, and not part of the actual data/plaintext itself?
That 011001011 he wrote down is actually the start of the SHA hash value for "abd". I wonder if that was intentional, because the odds of that happening randomly are less than one percent.
SHA Hashing Algorithm?
Secure Hashing Algorithm Hashing Algorithm
ATM Machine
RAS Syndrome
LAN Network
GNU's Not Unix...wait a minute
LCD Display
I love these videos when Dr. Mike Pound is in them.
Me: Explain SHA
Dr. Pound: Explains it
Me confused: Explain it to me like I'm 12
Dr. Pound: Explains it like I'm 12
Me still: Explain it to me like I'm 5...
How would the padding work if the final block of the message was long enough that you don't have enough padding room to say the number of bit in the message? So if the final block contained 510 bits you would have to pad in 9 bits(111111110) to say that the message is 510 bits, but you would end up with more than 512 bits.
The length field has a fixed size (which is sufficient enough) (also the field is not optional). The length of 10...0 is decided including the size of the length field i.e. you could jump over to the next block if required.
Can you talk about the colliding prefix issue? As I understand it once I find a collision with a file, I can continue to create collisions by appending the same thing to both files, and some how this allows me to create two meaningful files each with the same hash value where one might expect that any collision which might be found would be obviously fake because it would have to be made up of a bunch of random bits.
easy-going video which explains just enough about SHA algo to keep it simple. The details are better learnt once you "get" the basic idea.
Isn't padding used even if the message is already a multiply of 512 bits to avoid attacks?
This was very informatice!
Question: Is there any significance to the initialization constants
h0 = 0x67452301
h1 = 0xEFCDAB89
h2 = 0x98BADCFE
h3 = 0x10325476
h4 = 0xC3D2E1F0
Or are they chosen "randomly"?
Thanks!
No, hey could be any numbers. BUt the cryptographic comunity is very sceptical of numbers that come out of nowhere.
Since SHA is deterministic, even though it is non-reversible, it is still possible to guess the hashes of some reasonably short messages. For example, string 'abc' ALWAYS produces ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad. If I have a large enough database plus computational power, I could probably guess some short messages, although not the entire novel.
That's exactly how most cracking is done. Hashed database against hashed database lol
What if the message is only a few bits shy of a block, not enough room for padding bits as described?
If there's less than 65 bits of space left in the final block for padding, you just pad toward an extra block. For example if your message is 480 bits, you add a one-bit, 479 zero-bits, and the 64-bit length, giving total length 1024 bits = 2 blocks.
Matthijs van Duin thanks
What happens if your message is, say, 509 bits in length? How do you pad it if the length won't fit?
0:34 who made that visual ? :P
haha !
1:34 Well, by “completely changed” you mean that somewhere around 50% of the bits in the hash flip to different values. You don’t want them _all_ to flip (or flip according to any discernible pattern), otherwise that’s no longer quite so random.
I kinda want to make my own hashing algorithm now. It wouldn't be very good, it would just be some random jostling around of bits until it looks weird.
4:30
SHA-1
5:24
Compression Fuction.
6:29
Permutation.
7:36
SHA reversion.
Still a bit too confusing for me........ Can you make a video on Hashing VS Encryption? When is what used? If the hash always has less information than the actual file, why would you ever need to hash something in the first place?
Encryption is reverseable, hashing is not. In hashing, the receiver only need confirmation that the data is valid.
One example is password authentication. For security reason, the server does not store copy of user password, they only store hash of the password. When a user try to login, the server compare the password hash to the one stored as authentication. Meanwhile, if the database gets breached, people can't use password hash to find out the original password (other than brute-force the original password).
Dr Mike Pound is the best! More videos with him please
Thank you so much. I had a hard time finding someone to explain it well
So the padding is only denoted by the last one with a trail of zeroes and a length at the end? That is not a prefix and without some other way of indicating that padding is present it is indistinguishable from data.
After a quick google search it appears that the padding is always present so it doesn't need to be a prefix.
What happens if a message is smaller than 512 bits but long enough for the padding part to not have any space left to store the length of the message?
Then you pad to 1024 bits(including message length)
I just read on Wikipedia that the block size of SHA1 is 512 bits, but the internal state and output size is 160 bits. So, what's the difference between block size and internal state, and what happens to reduce 512 bits down to 160 bits and why?
The block size just says how much of the message the hash algorithm works on at a time. In this case, if you have a 1024 bit message, SHA-1 works on the first 512 bits first, then the next 512 bits. The internal state is where it "mixes" the input bits. That's five 4-byte words (Mike - and the specification - labels them H0, H1, H2, H3, H4). 5 words x 4 bytes x 8 bits = 160 bits.
For SHA-1, it actually first *extends* the 512 bits to 2560 - *eighty* 4-byte words. (It makes those "extension words" by xor'ing different parts of the original 512 bits together). Then it mixes each of the eighty words, one by one, into the state, using XOR's, AND's and bit rotations - also constantly rotating the words in the state. That way, the eighty words end up in five words. That's the compression.
The output is just the state when it's done. For some hashes (e.g. SHA-224, SHA-384), output size will be just a part of the final state. For SHA-1, the output is the entire state verbatim.
I don't understand the last bit about padding. If changing a single bit will completely change the hash result, why can't we just fill up the message with 0s up to the next 512 multiple?
I think in that case "abc" and "abc0" would produce the same hash.
9:18 The trailing 1 is added at the next byte. So if you have:
01101010
it will be padded like this:
01101010[100000000000000000000000000000........ 448] + (64 bits of message size)
If there isn't enough space for the 64 bit size block the block will be padded to the end and the size will get its own block. So like this:
Block 1:
01101010......[1000000000000000000000000000000........ 512]
Block 2:
[000000000....... 448] + (64 bits of message size)
I know this because I made my own implementation of the SHA-1 and SHA-2 algorithms
Hey can you explain me everything in brief again I wanna know If I got my thinking right
How do hash functions prevent creating collision free hashes if the functions are not communicating with each other or keeping track of all the hashes ever created?
you can't. it's not a problem, if you cannot do it on purpose in reasonable time.
Isn't it unsafe to have a padding scheme that leads to pre-image collision? E.g., h(msg) = h(pad(msg)).
So basically it's a randomization function that is seeded with the data you give it, right?
@5:21 "We might talk about that in a bit", proceeds to encrypt that bit in sha and turns it to 160 bits
is that u of Nottingham cup supposed to be some kind of product placement? it's like the camera is trying to keep it in frame and it doesn't even look like it been drank out of. also cool rubix cubes on the shelf
Given the whole of Computerphile is to some extent an endorsement of the University of Nottingham it seems unlikely, or at least unnecessary. More likely it happened to be part of the initial framing shot the camera operator wanted to avoid drifting from too much.
So a hash function can protect against doctoring a message.
How do you prevent the insertion or deletion of a message in stream of messages? Each can be hashed, but you could create a new message, hash it, send it and its deemed good.
Do you have a secure cryptographic sequence number than can be embedded in any way?
"How do you prevent the insertion or deletion of a message in stream of messages?"
Before sha'ing you just append a shared secret. That way someone intercepting the message on route won't be able to produce a valid hash for an altered message. The recipient verifies the integrity of the message by sha'ing the message with the shared secret appended to it.
"Do you have a secure cryptographic sequence number than can be embedded in any way?"
If you mean some "sequence" number that appears to change randomly from one message to another, yet is known/anticipated by the recipient, than that's basically their shared secret, except it's not static.
However, in this scenario getting out of sync would mean that all the following messages would fail their integrity checks, until some sort of reset. That makes it trivial to do a DoS attack on the protocol/exchange. One common way to counter this is to reset every minute or two, but then the communication would have to be (close to) real-time.
Such a sequence can be any sufficiently random pseudo-random number generator sequence.
What's amazing is the Tom Scott "rocket" animation didn't show up on a video from Dr. Pound
Mesaages of the same length don't share the same padding? How come?
I have no sound in either Chrome or Edge. The commercial at the beginning plays just fine. Other videos play fine.
Some people speak terrible not understandable english, he is one of them. Even whole words were not completely spoken.
What happens if the last block has, let's say 504 bits, and the last 8 bits does not have enough room to store the length of the message? Wouldn't the padding scheme break down?
In that case, you pad up to 1024 bits.
Re watched it at least 10 times. Thank you for this explanation
What would be the padding if the final chunk of message is only 502 - 511 bits?
What to stop someone from precomputing all of the possible hashes, and saving it to a file that can be read as an array, then doing the same with the things it was hashing being saved to a different file. When someone wants the reverse hash of something, open the file and look up the position of the hash within that file, then look up that same position in the un-hashed file.
or is it faster to just generate all possible combinations on-the-fly until finding a hash that matches.
That is actually a possibility, called a rainbow table.
One way around it is to use a salt: when a user first creates an account, you generate a random string of characters, append it to the password and then hash it. The random string is stored in your db alongside the hash.
This also mkaes it so you have to crack each user's password individually.
What if the length of the message is 511 bits? Then we have only 1 bit of padding, and we can't possibly store the number '511' in 1 bit of information.
Then you pad to 1024 bits (two blocks)
the video's shoots are like modern family and that make's me happy ! also the information so thanks!
a message can look like beginning of the padding message, like a message can have a '1 and 500 "0, will this case generate a false padding state ever?
I don't think it should, because the end of the padding contains the message length. If you know the message is n bits long, then you know exactly where message ends and where padding begins, no matter what other patterns they might contain. (Right? I think that's right...)
hi, please explain how you get new A B C D E? When you put 512 bits with initial A B C D E, you get new 512 bits, is it right?
what happens if I feed 511 bits? it's not a multiple for 512 but the space left is too short to save the length
Is it possible to superpose pseudo random number generators to increase the levels of randomness?
Tx for the video :-). Maybe someone can help me with this question: What does determine the outcoming hash? At the one hand it is totally random, at the other hand it is consistent? Is it a super hugh complex formula, so that it is better to randomly guess instead of solving the formula? Or is it the NSA the only one who has the formula?
5:50 summarised the subject in 1 sentence ;-)
I'm curious about the books on the shelf whose titles I can't read. They are the 4th, 6th,7th, and 11th books from the left. I don't think I care so much about the 12th book from the left. Does anyone know the titles of those books? I think I want those books.
would love an video on SHA-3
It'd be amazing to see Dr.Pound reviewing some books from his collection. Get to know his technical interests apart from image analysis.
So can two different string can output the same result after go through the hashing function?
What if the message has 159 bits? How can you add the padding with its length if you just have one available bit to do the padding?
Then you pad up to 320 bits
If sha1 is not reversible how come git does not screw up your files. I thought the whole point of hashing is to be able to get the original input somehow given some extra information available only to you. Probably I'm jumbilng up encryption and hashing.
NextLevelNoob That is a oneway *trapdoor* function
you are confusing encryption and hashing. git uses hashes as an identifier for a state while hoping that two hashes of different states will never be equal. (so far, to my knowledge, that never happened.)
A hash is *never* reversible. This is the purpose of a hash - not being able to get the original data back, but have a (somewhat) unique identifier of the original data.
Another video explaining SHA-256 would be awesome.
never been this early for a computerphile, dope
How does padding cope with a string length of, say, 510 bytes?
In that case padding extends into the next, newly added block.
but will there be any pattern?? inside those hashes
Nice! Could you make a video about post-quantum cryptography please? It will be a great opportunity to learn more about this stuff
Loved the washing machine demonstration!
I'm confused , what is that "abcde" stand for ? and why is the loop be done 80 times ?
and the text is 512 bits long right ? how do I convert them into H0-H4 which is 160 bits in total ?
thanks
Actually that process involves using x-or function ,you can see it on the net about the way the abcde is changed into a different abcde it is pretty interesting
Can u explain also the "Bundestrojaner"? #Backdoor:W32/R2D2.A #Staatstrojaner #mfc42ul.dll
I always wondered how these things work. Great video
I know youre not 'languagephile' but is there a real reason for nought and zero being so stark in contrast?
also: if oyu hve a message between 502 and 511 (inclusive) the padding would try to tack on 10 extra bits, how is that resolved? (10 bits because 1, then #of bits which is 9 in length)
I remember when SHA1 was actually still secure, and people could get away with MD5 (although it was started to be frowned upon). Now I feel old.
Apple once tried to get away with MD4.
Elegant explanation. Thank you, Thank you, Thank you 😊👍
What is the point of encrypting a 'message' if it can't be decrypted back? Is it even a message then?
Correct me if I understood something wrong
it's not used for encrypting a message.
the most common use is storing passwords: you store the hashed password when an account is created. when the user logs in, you hash the password they enter and compare it with the saved hash.
I feel like a genius learning everything here!
Wait, but if it is fixed length, then there are a finite number of messages you could produce
Yes, there are multiple inputs that produce the same output. That's not a problem, if there is no way to find the other inputs though.
If that's how it works, it is very easy to find collisions:
1. Hash 20 bits long data
2. Copy the 512 long data that have been created (by the rules of padding one followed by zeros plus the size)
Then you have two inputs that are essentially the same who share the same output.
So I think there is a lot more sense in applying those rules no matter what the size of the input is, and adding 512 bits blocks to the end if needed.
I think this is how the SHA works.
That's how it works, yes - it's always padded. Without padding you can easily append whatever you like at the end of a message; an important part of the integrity check is to tell where the message ends.
It's not at all easy to find collisions, though. When you hash 20 bits of data you're actually hashing 512 bits of data as the algorithm only works with exactly 512 bits at a time, i.e. one block. The remaining 492 bits must therefor be padded in a consistent way - if you pad it this way and I pad it that way, we'll end up with different 512 bit blocks, which in turn will result in very different hashes.
If the message length (in bits) modulus 512 is 447 or less, there's room for the padding which is one 1, followed by however many 0's needed to get to 448 bits. Finally, the 64 bit length of the message is added (which brings it up to exactly 512 bits). If there's not enough room, additional 0's are added in a following block, up until there's only 64 bits left. (If the message length modulus 512 is 0, then the final block will consist of nothing but a 1 followed by 447 0's and then the length.)
10:21
Who the heck writes the subtitles for these videos and why are they so badly wrong? They managed to mishead 'SHA' as both 'char' and 'shower' within the space of like a minute. They're not auto-generated... or are they?
Could you ever get the same hash result from two different original pieces of data?
There is a bigger number of messages that you can hash than there are possible hashes that you can generate. So yes, you can. But the better the hashing algorithm the less likely such a collision. A hashing algorithm is seen as cracked when someone can look at a hash and find out a message that can generate the exact same hash.
thanks for the replies. that makes a lot of sense now that you say it.
When you think about it it is obvious that there are collisions. The output of the algorithm is fixed length, but the input can be arbitrarily long. That means that there are infinitely many more possible inputs than there are outputs, and therefore some inputs will give the same outputs.
Is there mathematical theory to prove that the resulting hashes are "evenly" distributed in the 2^256 or 2^160 space? If the resulting hashes are somewhat "clustered" in a space that is smaller than perceived 2^256, then the chance of collision would be higher
I would love to see a video about the compression function! :)