I love that the algorithm suggested this to only people who already understand hashing and want to know why you never specifically explain hashing. Great video by the way. Keep up the good work.
Alice doesn't ever send any algorithm to Bob. She only sends an argument to a preexisting algorithm they both know. Also, this is basic hashing used all the time.
I mean, WOW, this is a harsh comment section. As someone (apparently in the minority) who didn’t immediately know of the ‘clear and obvious solution’ of hashing, I thought you did a great job setting up the premise of the problem to a layperson-how can you verify data with high accuracy by sending fewer than n bits?-and explaining it using an example that makes fairly implicit sense. Whether this example is the best or most efficient is mostly irrelevant, because your goal was not to suggest actual practices in data verification, only to touch on the theory behind it. Keep up the good work, I love your presentation style and look forward to more videos!
Not really. Assuming a flat random distribution of numbers between 0 and n, the most efficient way to communicate the number is to send the whole number digit by digit. Random noise is incompressible.
What are capital A and capital B? Are they the same as the lowercase versions? Edit: isn’t the “a number C with at least m distinct prime divisors, must have C > 2^m, and so therefore log_2(C) > m, and so any positive natural number C has at most log_2(C) distinct prime divisors”, like, way simpler than appealing to the \calO(log(N)/log(log(N))) result? And like... with the \calO in there, shouldn’t that not tell you what constant factor is involved?
To be clear, doing this is not a good idea because there are ways to easily forge a fingerprint of this type. Much more advanced and up to date fingerprint methods should be used in practice.
There's no adversary in the scenario in the video (apart from their mobile service provider, obviously, clearly a supervillain). Forgery isn't an issue this stuff is trying to solve. It's not cryptography, it's fingerprinting. But mod is a horrible fingerprint for obvious reasons.
05:30 Why are you assuming that the combination of height and weight would be more unique than a fingerprint? In my opinion this is the wrong choice. Fingerprints are the very unique thing, whereas you may always find people with same height and weight (well, kg and cm) in a group of people big enough.
Hamming codes will only reliably detect differences of a single bit or two bits. If your message is of any significant size, there could actually be literally billions of other possible messages that will share the same redundancy bits (and generate false positives)...
@@foogod4237 For any finite fingerprint there must be infinite messages that generate this fingerprint. This is regardless of whether you use mod or hamming or crc or sha or md5. Or md4. Or anything. This is because of the pigeonhole principle. Also explained partially in this video actually, at the beginning. False positives are always guaranteed possible. There could actually be literally billions and trillions and infinite other possible messages that share the same redundancy bits. There will be. There always are.
Here's a simple improvement to your algorithm to demonstrate that your measure of "accuracy" is completely meaningless. Suppose instead that the algorithm is: Alice sends no information, and Bob always concludes that the passwords are different. Now let's do the same calculation you've done. What's the probablility that Bob makes an error, given that the passwords are different (Pr[error] as you put in the video)? Well, given that the passwords are different, and that Bob always concludes that the passwords are different, he is never incorrect in these cases. So P[error] = P[guessed "same" | a does not equal b] = 0. It's important to just have a sanity check on how useful the information is. It's easy to guess 2 passwords are different, because that happens most of the time. We really want to know when they are the same, and both of these algorithms are useless at determining when passwords are the same. Almost all of the positive results will be false positives
In your example probability of error in case where passwords are equal is 100%. This is bad algorithm no matter what is probability for non-equal case.
Agree. There are people which have problems with focusing on speech when there is too much background noise. They are lost with such plingpling videos.
Um. Uh. Why is p being sent? Can't they just both generate the same p deterministically? Also, sha256 is a thing. Among many others. Just send n bits of the hash, and the chance of collision is then 1/2^n. Of course that makes for a really short video. But the upside is that this is more practical. It's the way this would be done in the real world. But the real wtf is that they're paying $1 per message. I know China has a firewall but this is just ridiculous. Also, it is illegal to share Netflix passwords.
The purpose of this video was obviously to explain the mathematical background behind the general concept of fingerprinting, which can be applied in various ways to all kinds of applications, if you understand how it actually works. Saying "just use SHA256" doesn't help anyone understand anything about information theory or how these sorts of algorithms actually work. (Or in other words, this is an information theory video, not a programming how-to video)
@@foogod4237 If it's so obvious then why are you telling me this? To be clear, I was well aware this was a math video. Sha256 isn't about programming, it's about math. Math gets used a lot in computers. This mod crap is just useless in so many ways. It can also fail quite spectacularly. By doing mod, you only check the modulus, which is only the low digit in a mod-n base number system. It's checking the lowest bit only for p=2 for instance. The other bits can vary freely yet still produce the same modulus. So terrible. Also applicable to cyclical signals such as waveforms. No one uses this stuff anywhere. So what's this video about? No one uses mod-n for fingerprinting because it's very bad for fingerprinting. So this video is still crap. And whether you put your math in code or write it with xs and ys, alphas and betas or pi and tau, that's not relevant. In the end it's all math.
@@xbzq Any given modulus might have a bunch of collisions, but the chance that the two given passwords collide on a randomly chosen prime p in the specified range is very low. And this is why the primes are chosen randomly. As for generating the same prime deterministically, whilst also still having that prime be uniformly sampled, they'd either need some kind of synchronisation, or to exchange some other kind of information, which lies outside of the scope of the described protocol. The conclusions of the video clearly show that this protocol works very well when the two passwords are uniformly randomly sampled, so there will certainly be many places where this is useful and the video illustrates that fingerprinting is indeed possible, without discussing any very complicated fingerprinting protocols that are actually used. Which might not have been intuitively obvious to someone who hasn't come across fingerprinting before. Which may introduce people to the concept of sublinear communication, which I believe was the purpose of the video
@@charlottedarroch It's not a given that this information is passwords. Mod collisions are common. Shor's algorithm relies on it, for instance. Even if there's no exact multiple, we still may get a beat frequency. Mod is terrible. Also, a 45 min video explaining this junk is just overkill. It's a bad video.
This video can be 10 seconds long and end with ”yes, use hashing.”
That's what I was afraid of
Thank you. I'm 3 minutes in and scrolled down to see if it was hashing or some variant of a ZK protocol. I'll stop now.
hash collisions?
I love that the algorithm suggested this to only people who already understand hashing and want to know why you never specifically explain hashing.
Great video by the way. Keep up the good work.
Alice doesn't ever send any algorithm to Bob. She only sends an argument to a preexisting algorithm they both know. Also, this is basic hashing used all the time.
Why didn't you mention hashing at all?
I mean, WOW, this is a harsh comment section. As someone (apparently in the minority) who didn’t immediately know of the ‘clear and obvious solution’ of hashing, I thought you did a great job setting up the premise of the problem to a layperson-how can you verify data with high accuracy by sending fewer than n bits?-and explaining it using an example that makes fairly implicit sense. Whether this example is the best or most efficient is mostly irrelevant, because your goal was not to suggest actual practices in data verification, only to touch on the theory behind it. Keep up the good work, I love your presentation style and look forward to more videos!
Gosh this comment section is awfully mean. I think it's quite a good introduction to the concept of fingerprinting. Thanks for the video :)
"Isn't it magic how you can send less bits than the total and still be able to recover the original data?"
Prime factorization: "Bonjour"
Actually the prime factorisation of a number can be stored more efficiently than the number itself
@@guydror7297 yeah that's what I'm saying. Data compression isn't a new thing
Not really. Assuming a flat random distribution of numbers between 0 and n, the most efficient way to communicate the number is to send the whole number digit by digit. Random noise is incompressible.
What are capital A and capital B? Are they the same as the lowercase versions?
Edit: isn’t the “a number C with at least m distinct prime divisors, must have C > 2^m, and so therefore log_2(C) > m, and so any positive natural number C has at most log_2(C) distinct prime divisors”,
like, way simpler than appealing to the \calO(log(N)/log(log(N))) result?
And like... with the \calO in there, shouldn’t that not tell you what constant factor is involved?
To be clear, doing this is not a good idea because there are ways to easily forge a fingerprint of this type. Much more advanced and up to date fingerprint methods should be used in practice.
There's no adversary in the scenario in the video (apart from their mobile service provider, obviously, clearly a supervillain). Forgery isn't an issue this stuff is trying to solve. It's not cryptography, it's fingerprinting.
But mod is a horrible fingerprint for obvious reasons.
05:30 Why are you assuming that the combination of height and weight would be more unique than a fingerprint? In my opinion this is the wrong choice. Fingerprints are the very unique thing, whereas you may always find people with same height and weight (well, kg and cm) in a group of people big enough.
I would solve this problem using Hamming codes, sending only redundancy bits, which should be enough information.
Hamming codes will only reliably detect differences of a single bit or two bits. If your message is of any significant size, there could actually be literally billions of other possible messages that will share the same redundancy bits (and generate false positives)...
@@foogod4237 For any finite fingerprint there must be infinite messages that generate this fingerprint. This is regardless of whether you use mod or hamming or crc or sha or md5. Or md4. Or anything. This is because of the pigeonhole principle. Also explained partially in this video actually, at the beginning. False positives are always guaranteed possible. There could actually be literally billions and trillions and infinite other possible messages that share the same redundancy bits. There will be. There always are.
Here's a simple improvement to your algorithm to demonstrate that your measure of "accuracy" is completely meaningless. Suppose instead that the algorithm is: Alice sends no information, and Bob always concludes that the passwords are different.
Now let's do the same calculation you've done. What's the probablility that Bob makes an error, given that the passwords are different (Pr[error] as you put in the video)? Well, given that the passwords are different, and that Bob always concludes that the passwords are different, he is never incorrect in these cases. So P[error] = P[guessed "same" | a does not equal b] = 0.
It's important to just have a sanity check on how useful the information is. It's easy to guess 2 passwords are different, because that happens most of the time. We really want to know when they are the same, and both of these algorithms are useless at determining when passwords are the same. Almost all of the positive results will be false positives
In your example probability of error in case where passwords are equal is 100%. This is bad algorithm no matter what is probability for non-equal case.
🧬
Yet another RUclips video ruined by the addition of background "music". I had to quit 1 minute in.
Agree. There are people which have problems with focusing on speech when there is too much background noise. They are lost with such plingpling videos.
Exactly. I wonder why people do this.
Um. Uh. Why is p being sent? Can't they just both generate the same p deterministically? Also, sha256 is a thing. Among many others. Just send n bits of the hash, and the chance of collision is then 1/2^n. Of course that makes for a really short video. But the upside is that this is more practical. It's the way this would be done in the real world. But the real wtf is that they're paying $1 per message. I know China has a firewall but this is just ridiculous. Also, it is illegal to share Netflix passwords.
The purpose of this video was obviously to explain the mathematical background behind the general concept of fingerprinting, which can be applied in various ways to all kinds of applications, if you understand how it actually works. Saying "just use SHA256" doesn't help anyone understand anything about information theory or how these sorts of algorithms actually work.
(Or in other words, this is an information theory video, not a programming how-to video)
@@foogod4237 If it's so obvious then why are you telling me this? To be clear, I was well aware this was a math video. Sha256 isn't about programming, it's about math. Math gets used a lot in computers. This mod crap is just useless in so many ways. It can also fail quite spectacularly. By doing mod, you only check the modulus, which is only the low digit in a mod-n base number system. It's checking the lowest bit only for p=2 for instance. The other bits can vary freely yet still produce the same modulus. So terrible. Also applicable to cyclical signals such as waveforms.
No one uses this stuff anywhere. So what's this video about? No one uses mod-n for fingerprinting because it's very bad for fingerprinting.
So this video is still crap. And whether you put your math in code or write it with xs and ys, alphas and betas or pi and tau, that's not relevant. In the end it's all math.
What is illegal is not always immoral, and what is legal is not always moral
@@xbzq Any given modulus might have a bunch of collisions, but the chance that the two given passwords collide on a randomly chosen prime p in the specified range is very low. And this is why the primes are chosen randomly. As for generating the same prime deterministically, whilst also still having that prime be uniformly sampled, they'd either need some kind of synchronisation, or to exchange some other kind of information, which lies outside of the scope of the described protocol. The conclusions of the video clearly show that this protocol works very well when the two passwords are uniformly randomly sampled, so there will certainly be many places where this is useful and the video illustrates that fingerprinting is indeed possible, without discussing any very complicated fingerprinting protocols that are actually used. Which might not have been intuitively obvious to someone who hasn't come across fingerprinting before. Which may introduce people to the concept of sublinear communication, which I believe was the purpose of the video
@@charlottedarroch It's not a given that this information is passwords. Mod collisions are common. Shor's algorithm relies on it, for instance. Even if there's no exact multiple, we still may get a beat frequency. Mod is terrible. Also, a 45 min video explaining this junk is just overkill. It's a bad video.
#SOME3