Thanks! That was my primary motivation for making the video - having implemented the hash myself, I found a whole lot of details that aren't all that obvious and worth discussing.
Wow, that was marvelous. You are an outstanding algorithm teacher. You give people just the right amount of explanation necessary for understanding a concept. Your videos are not long and not short; they are just the perfect length. Well done on that, and please keep going. The community needs more content like this one.
Super clear explanation to a complete beginner - was following MIT's course and was lost but this video completely breaks down the intuition behind why a rolling hash is computed this way! thanks
Hi Piyush, I posted a video on KMP string search algorithm recently. In case you haven't seen it, here is the link: ruclips.net/video/8xWrNQOC1Ts/видео.html Let me know what you think!
@@piyushmishra2348 Thanks for feedback and the suggestion to make more DP related videos, I do appreciate it. I have a few dynamic programming topics planned, but if you have anything specific, just let me know! =)
@@stablesort I went on a Computer Science blitz for the past two days, and couldn't find a reliable explanation on Big numbers. How to represent big numbers, multiply them(Karatsuba Multiplication for instance), add, raise them and divide them would be a good video to consider! Another topics that I couldn't find good explanations on are 2-4 Trees, and B-trees. Thanks for your reply!
@@ale-lx9gp thanks for the tip on the lack of good tutorials on those subjects. The best part is that I am only vaguely familiar with them, so it'll get me to research and learn a bunch in the process. Much thanks!
@@stablesort What happens when the hash value becomes negative when applying the rolling hash principle? Do we have to add the prime number (here 113) to it? Correct me If I'm wrong.
@@yagamiram Good question. You are right, you could end up with a negative number. However, as weird as it sounds, it's actually OK to take a MOD of a negative number. Negative numbers map to positive ones in modular arithmetic. For example, -2 in mod 10 is the same as +8 in mod 10, or +18 mod 10. So it's just like going counter-clockwise on the dial of numbers that goes from 0...9. So you could keep it as a negative number or, as you mentioned, you could convert it to a positive number if so desired.
Great explanation on Rabin-Karp algorithm. One question I have is that the modulo multiplication for very large numbers, for example, 10^99 as in your video at 6:51. The number of operations for one loop of modulo multiplication is 99, which is the same as the pattern length. In this way, the time complexity will be O (m*n) with larger constants than the brute force method. This makes using rolling hash pretty useless. Maybe we should sought to other hash functions when the pattern length is too long?
Very nice explanation. I had a question at minute 6 when you explaining about using big numbers. You talk about (10^99) % 113 which works great, however this does not seem to match the initial formula you gave. For example (10^99 + 10^98 + 10^96 ..... 10^0) % 113. Your initial hash formula applies mod only at the end after summation and so we still need to hold the value of a really large number ? And if you say, we change the function to apply the mod at each step before summation, then wouldn't the rolling function also change ? Thanks in advance.
That's a fair point. The challenge for me is balancing the length of the video with the amount of interesting tangential information to include. Let me direct you to this post that you might find interesting as it answers the questions "Why should hash functions use a prime number modulus?" stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus There is a pretty good explanation that talks about hash values being used as keys in hash tables, which have some fixed number of "buckets". You want your hash function to generate keys that distribute as uniformly as possible over the buckets of the hash table, no matter what the input is to the hash function. Hence the use of prime numbers, so as to avoid common multiples between buckets and keys. Further reading: cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function/64191#64191
Thanks for the great video, helped me out a bunch. I did have one issue, though: The term "h(xyz...) - x*B^p mod(Q)" (seen at 6:00, as "h(abcd) - 1*10^3 mod(113)") can sometimes be negative, which ruins the hash. I struggled with getting it to work until I found the trick elsewhere online of adding the mod number in the same term: "h(xyz...) + Q - x*B^p mod(Q)" Is there a reason this wasn't addressed in the video? I'm wondering if maybe it IS addressed but I just missed that part of the video, or if there's a better solution that the creator of the video had in mind. Thank you.
Good question. Trying to keep the video short is the main reason why I didn't touch on that subject. But you are right, you could end up with a negative number. However, does that actually "ruins" the hash? =) As weird as it sounds, it's actually OK to take a MOD of a negative number. Negative numbers map to positive ones in modular arithmetic. For example, -2 in mod 10 is the same as +8 in mod 10, or +18 mod 10. So it's just like going counter-clockwise on the dial of numbers that goes from 0...9. So you could keep it as a negative number or, as you mentioned, you could convert it to a positive number is so desired.
@@stablesort Wow, thanks for the reply! Okay, I think I understand now. And so for a language like Python it will keep the mod positive automatically, but for Java I would have to do that myself if that is desired. Thanks again!
Thanks for posting a question. Either is just fine. I happen to have picked 113. By the way, in "real world" implementation you should probably pick a larger prime, perhaps 5 or 6 digits long, depending on the data set.
I went from not understanding to not understanding. But that's not ur fault. I'm just not at this level yet. Do you have any suggestion of an online course or roadmap that covers these kind of algorithms that I can build my knowledge on?
Good question... How much programming (and schooling) experience have you had so far? By the way, I'd guess that the intricacies of hash function implementations are probably not talked about until at least the 2nd year of college CS undergraduate courses.
Stable Sort I’m been doing for about 4 years. I have finished the AP CS A curriculum and want to dip my toes into some more complicated algorithms and competitive coding. I’m also going through rob Edward’s data structure playlist on RUclips right now since we’re all in quarantine.
@@albert_chen I took a quick look at Rob Edward's channel and it looks pretty good. I don't know if there is a channel that comprehensively teaches more complicated algorithms. When I was taking the algorithms course in college, the book we used was "Introduction to Algorithms" www.amazon.com/Introduction-Algorithms-Press-Thomas-Cormen-ebook/dp/B007CNRCAO/ref=sr_1_4 which I still reference every now and then. As far as competitive coding, I'd say just practice on websites like spoj.com
Sorry for the late reply. For most of my videos I usually do implement the full algorithm before making the video itself. But in this case I was too lazy to do it. But may be try this one: github.com/lemire/rollinghashjava (full disclosure - that is not my code, it's just the top result from a google search)
That was more clearly stated than a hungover Nancy Pelosi could have done, even if her denture adhesive were working. Good job Andre! Nicely edited, no hemming and hawing, moved right along, clear graphics, points not belabored... I still think a vid about hashish rolling might be interesting though :)
Ha! There is just no getting away from politics... On that note, this is now to flip electoral districts in O(log N) running time complexity: ruclips.net/video/oAR_EYd8im0/видео.html
Why not simply use the first and last character and check the rest only if there is a match on the first and the last?? Matching the entire 100 chars is stupid - there can be too many combinations with the same end value!
Matching the first and last character before moving on to others is really no different from matching the first and second character before moving on to others =) The benefit of computing the rolling hash for a given window is that the cost of doing it is fixed, regardless of the size of the window. In effect, allowing you to compare all of the characters in the window at once. Of course, in practical implementations, there is small chance of collision, which you then handle separately. But still, the chance of the collision is much much lower than the chance of two characters matching up (be it the first and last or whatever pair you choose).
@@stablesort Thank you for the detailed explanation. But I think statistically the chance to match the first and the last characters of a 100 character substring are less than the chance to match the first 2 characters or any given 2 consecutive characters. If I would do the function, I will split it into 3 steps: 1) run match for the first and last char and store the result; 2) calculate the total sum of each string from the prev step; and 3) match the results with the correct sum with the searched string. This way all the mods can be skipped. Excuse me if this is a stupid idea but the 'like' function in Oracle practically doesn't work of a few billion records, so I think someone should definitely improve this...
@@pepi1442 I am curious to know why is the chance of matching the fist and last characters is different from matching on first and second (or any other pair for that matter)? Are you basing this on word patters of English language? I am think about this more from the standpoint of random strings, where the frequency of one character following some other one is the same. The practical implementation aspect of this is an interesting discussion in its own right. Last time I checked the string.indexOf(...) function in Java doesn't do anything sophisticated at all. it literally does a bruit force char by char comparison and this is the default implementation of the language... By the way, the LIKE function in Oracle on a few billion records is slow probably for another reason - the problem there is having to fetch all of the rows from disk. If there is no index defined on the specific column, the query will end up doing a full table scan. So even if the string comparison function itself is fast, the whole process is still going to be very slow. So yeah, creating an index on that column usually solves the problem.
@@stablesort Some food for thought from the practical aspect :) I am trying to figure out how to describe it mathematically but I believe that in a text in any given language the chance to meet and 2 consecutive characters are higher than the chance to meet them separated with 100 characters. Imagine how many times you will meet let's say "ab" in a book and how many times you will have "a" and "b" separated by 100 characters. And using this approach, increasing the length of the search string doesn't bring any complexity, but on the contrary, the search will become much faster the longer the searched string is. The probability to meet "a" and "b" separated by 10000 characters will be way less than the probability to find them separated by 100 characters. string.indexOf(...) is simple as it is normally used on array values that are sent from the backend, eg. it is not supposed to run on large scale data. Indexing is not a panacea - it just creates a new table that is sorted in the indexed column(s). For searching a string it will have almost 0 effect if the column is indexed, you still need to do a full scan and match each string value one by one. This brings me to another plus of my solution - values that have smaller length the searched string will be discarded without having to calculate the mods. Fetching the rows from the disk is faster than matching the string, I can tell from practice that exact string match works super fast as opposed to the partial string match on the same large data. Anyway, thanks for the nice discussion.
That's an interesting idea and I suppose not that difficult to test - grab some big text file and do some number crunching. You are right about the indexing - for some reason I was thinking that your query uses "... LIKE 'abc%' ..." , in which case the index does help since the beginning x number of characters is indexed. But, now that I think about it (having slept a little), you were discussing queries of the form ".... LIKE '%abc%' ...", in which case the index is indeed useless. From what I understand, this is still an unsolved problem in the industry - searching for some string in the middle of long text columns. Thanks for the food for thought =)
Easily top 5 most eloquent algorithm teachers on RUclips, thank you for the amazing lesson!!
Thank you!!! I'll do my best to keep it up
After searching and reading through a lot of sources, this is the only video that explains every detail clearly! Really good explanation!! Thanks.
Thank you for such a wonderful compliment!
Dude you killed this topic. No one explained this topic in detail other than you
Thanks! That was my primary motivation for making the video - having implemented the hash myself, I found a whole lot of details that aren't all that obvious and worth discussing.
Best video to include all details! Thanks a lot !
I searched for a long time for different materials on this algorithm, until I came across your video - it is the best explanation, thank you!
You are very welcome! And thanks for the good words!
Brilliant. Please keep making videos
very crisp explanation .
I rarely comment on videos, but this was an extremely well explanation so I had to write something. I hope you make more videos. Thanks for this!
I appreciate that!
This is one of the best videos I've seen to understand algorithms. Sir, thank you so much for this...and please upload more often!!
Thank you, I will!
You are so amazing! I was struggling with understanding Rabin kerp and rolling hash but now it’s very clear! Thank you so much!!!
Wow, that was marvelous. You are an outstanding algorithm teacher. You give people just the right amount of explanation necessary for understanding a concept. Your videos are not long and not short; they are just the perfect length. Well done on that, and please keep going. The community needs more content like this one.
This content deserves millions of subs, I hope you make more videos soon!
Super clear explanation to a complete beginner - was following MIT's course and was lost but this video completely breaks down the intuition behind why a rolling hash is computed this way! thanks
Really well articulated - thank you!
Thanks for the compliment! Cheers!
I love your content and simple explanations, they are really helpful
Great! Thanks for the compliment!
Thanks! I really liked your explanation!
Great! Thanks for the compliment!
Your sanity check completely cleared my doubt thanks🇮🇳
Glad to hear it! Cheers!
Great video...concise yet clear...looking forward to the KMP explanation video..cheers
Hi Piyush, I posted a video on KMP string search algorithm recently. In case you haven't seen it, here is the link: ruclips.net/video/8xWrNQOC1Ts/видео.html
Let me know what you think!
Thanks for yet another awesome video. If possible, please consider making more videos on dynamic programming. Thanks!
@@piyushmishra2348 Thanks for feedback and the suggestion to make more DP related videos, I do appreciate it. I have a few dynamic programming topics planned, but if you have anything specific, just let me know! =)
What a gem, very good video.
Thanks for the good words!
Great video, definitely deserves more views.
Thanks for the compliment!
Very good explanation. Simple and to the point!
This is incredible! amazing work, to the point, funny, and informative!
Please never stop making videos!
Thanks! I do have a bunch more videos to make from my to-do list. By the way, if you have topic suggests, please do let me know. Cheers!
@@stablesort I went on a Computer Science blitz for the past two days, and couldn't find a reliable explanation on Big numbers.
How to represent big numbers, multiply them(Karatsuba Multiplication for instance), add, raise them and divide them would be a good video to consider!
Another topics that I couldn't find good explanations on are 2-4 Trees, and B-trees.
Thanks for your reply!
@@ale-lx9gp thanks for the tip on the lack of good tutorials on those subjects. The best part is that I am only vaguely familiar with them, so it'll get me to research and learn a bunch in the process. Much thanks!
@@stablesort My pleasure!
Thanks for the video sir.
you are very welcome!
Great video! Thank you for your explanation!
Thanks for leaving a compliment!
amazing clear video, love u
Cheers!
Excellent Video! Thanks.
Thanks! Glad to hear that you enjoyed it!
@@stablesort What happens when the hash value becomes negative when applying the rolling hash principle? Do we have to add the prime number (here 113) to it? Correct me If I'm wrong.
@@yagamiram Good question. You are right, you could end up with a negative number. However, as weird as it sounds, it's actually OK to take a MOD of a negative number. Negative numbers map to positive ones in modular arithmetic. For example, -2 in mod 10 is the same as +8 in mod 10, or +18 mod 10. So it's just like going counter-clockwise on the dial of numbers that goes from 0...9. So you could keep it as a negative number or, as you mentioned, you could convert it to a positive number if so desired.
good vid, you deserved this comment for YT algo
Really needed this short
Good luck!
Super video! Such a clear explanation and funny, too!
Thank you, Mie, for watching and for leaving a nice comment! I do appreciate it.
Best video on Rabin-Karp. Very nicely explained.
This is super useful! Great to know this algorithm. Looking forward to learning more!
Thanks for watching! I am glad you liked it!
Great explanation , I like this video
Great Explanation! Thank you! I hope you will keep working on the channel !
Thanks for the compliment! Glad to hear that it was helpful. I'll be putting up more episodes soon =)
Best explanation
Great! An useful algorithm.
Thank you so much!
Yes, pleeeaaaase! That's how it's done.
thank u very much
Glad to see your liked the video =)
Thank you sir.. It was very clear and concise explanation.
You are welcome =)
Thank you Sir!
Thank you for the awesome video !! :)
New episodes are on the way. Stay tuned! By the way, if there is a topic that you'd like me to cover, please do let me know. Thanks for watching!
good vid.
1 disliker is Brute Force Algorithm.
=) I'd add KMP to that: ruclips.net/video/EL4ZbRF587g/видео.html
@@stablesort dislikers came here for the other kind of rolling hash 🤣 Your tutorials are amazing!!
@@y.violetguo5305 =) Haha, thanks for the compliment!
I love the intro 😂🤣😂
=)
Thank you very much I really appreciate it.
wonderful explanation sir! thanks a lot
Thanks for the compliment! Please spread the good word =)
Thanks. Interesting and funny :)
Glad you enjoyed it!
Thanks! That helped!
Glad to hear it!
why can't you be my professor, you are so much better at explaining things
=) thanks for a wonderful compliment!
Great explanation on Rabin-Karp algorithm. One question I have is that the modulo multiplication for very large numbers, for example, 10^99 as in your video at 6:51. The number of operations for one loop of modulo multiplication is 99, which is the same as the pattern length. In this way, the time complexity will be O (m*n) with larger constants than the brute force method. This makes using rolling hash pretty useless. Maybe we should sought to other hash functions when the pattern length is too long?
Very nice explanation. I had a question at minute 6 when you explaining about using big numbers. You talk about (10^99) % 113 which works great, however this does not seem to match the initial formula you gave. For example (10^99 + 10^98 + 10^96 ..... 10^0) % 113. Your initial hash formula applies mod only at the end after summation and so we still need to hold the value of a really large number ? And if you say, we change the function to apply the mod at each step before summation, then wouldn't the rolling function also change ? Thanks in advance.
Amazing!
Thanks!
Awesome
Thanks!
Great video. It would be better if you can talk about how to choose the base and the module number and why.
That's a fair point. The challenge for me is balancing the length of the video with the amount of interesting tangential information to include.
Let me direct you to this post that you might find interesting as it answers the questions "Why should hash functions use a prime number modulus?"
stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus
There is a pretty good explanation that talks about hash values being used as keys in hash tables, which have some fixed number of "buckets". You want your hash function to generate keys that distribute as uniformly as possible over the buckets of the hash table, no matter what the input is to the hash function. Hence the use of prime numbers, so as to avoid common multiples between buckets and keys. Further reading:
cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function/64191#64191
Thanks for the great video, helped me out a bunch. I did have one issue, though:
The term "h(xyz...) - x*B^p mod(Q)" (seen at 6:00, as "h(abcd) - 1*10^3 mod(113)") can sometimes be negative, which ruins the hash. I struggled with getting it to work until I found the trick elsewhere online of adding the mod number in the same term:
"h(xyz...) + Q - x*B^p mod(Q)"
Is there a reason this wasn't addressed in the video? I'm wondering if maybe it IS addressed but I just missed that part of the video, or if there's a better solution that the creator of the video had in mind. Thank you.
Good question. Trying to keep the video short is the main reason why I didn't touch on that subject. But you are right, you could end up with a negative number. However, does that actually "ruins" the hash? =) As weird as it sounds, it's actually OK to take a MOD of a negative number. Negative numbers map to positive ones in modular arithmetic. For example, -2 in mod 10 is the same as +8 in mod 10, or +18 mod 10. So it's just like going counter-clockwise on the dial of numbers that goes from 0...9. So you could keep it as a negative number or, as you mentioned, you could convert it to a positive number is so desired.
@@stablesort Wow, thanks for the reply! Okay, I think I understand now. And so for a language like Python it will keep the mod positive automatically, but for Java I would have to do that myself if that is desired. Thanks again!
I subscribed for more such useful video.
Thanks and welcome!
Your amazing.
Now I understand thanx
Thanks posting a compliment!
6:49 - why is this for loop equivalent to 10^99 mod 113 and what is the role of n?
You know, you got an off by one in the name! It's Rabin-Karp (or Karp-Rabin) named after Michael Rabin and Richard Karp.
Thanks for pointing out the spelling mistake. Guilty as charged.
Q: Why was the horse anxious?
A: It was going through a stable sort.
(Lame) jokes aside, thanks for the clear explanation!
Hehe
Nice video...!!
l like the way u explain concep..
Thanks for the compliment! I am glad to hear that you enjoyed it!
The thing I lack in this video is justification why KNP is performing better. Other than that, great video
is there a good explanation of choosing 113 as the prime not 101?
Thanks for posting a question. Either is just fine. I happen to have picked 113. By the way, in "real world" implementation you should probably pick a larger prime, perhaps 5 or 6 digits long, depending on the data set.
lol. "always roll your hash responsibly"
=)
I went from not understanding to not understanding. But that's not ur fault. I'm just not at this level yet. Do you have any suggestion of an online course or roadmap that covers these kind of algorithms that I can build my knowledge on?
Good question... How much programming (and schooling) experience have you had so far? By the way, I'd guess that the intricacies of hash function implementations are probably not talked about until at least the 2nd year of college CS undergraduate courses.
Stable Sort I’m been doing for about 4 years. I have finished the AP CS A curriculum and want to dip my toes into some more complicated algorithms and competitive coding. I’m also going through rob Edward’s data structure playlist on RUclips right now since we’re all in quarantine.
@@albert_chen I took a quick look at Rob Edward's channel and it looks pretty good. I don't know if there is a channel that comprehensively teaches more complicated algorithms. When I was taking the algorithms course in college, the book we used was "Introduction to Algorithms"
www.amazon.com/Introduction-Algorithms-Press-Thomas-Cormen-ebook/dp/B007CNRCAO/ref=sr_1_4
which I still reference every now and then. As far as competitive coding, I'd say just practice on websites like spoj.com
Stable Sort thanks for the reply, will do.
Very good question. Thankful for the answers given too.
Can you please provide the slides?
Hash responsibly !
=P
Hi, is the code available?
Sorry for the late reply. For most of my videos I usually do implement the full algorithm before making the video itself. But in this case I was too lazy to do it. But may be try this one:
github.com/lemire/rollinghashjava
(full disclosure - that is not my code, it's just the top result from a google search)
That was more clearly stated than a hungover Nancy Pelosi could have done, even if her denture adhesive were working.
Good job Andre! Nicely edited, no hemming and hawing, moved right along, clear graphics, points not belabored...
I still think a vid about hashish rolling might be interesting though :)
Ha! There is just no getting away from politics... On that note, this is now to flip electoral districts in O(log N) running time complexity: ruclips.net/video/oAR_EYd8im0/видео.html
Why not simply use the first and last character and check the rest only if there is a match on the first and the last?? Matching the entire 100 chars is stupid - there can be too many combinations with the same end value!
Matching the first and last character before moving on to others is really no different from matching the first and second character before moving on to others =) The benefit of computing the rolling hash for a given window is that the cost of doing it is fixed, regardless of the size of the window. In effect, allowing you to compare all of the characters in the window at once. Of course, in practical implementations, there is small chance of collision, which you then handle separately. But still, the chance of the collision is much much lower than the chance of two characters matching up (be it the first and last or whatever pair you choose).
@@stablesort Thank you for the detailed explanation. But I think statistically the chance to match the first and the last characters of a 100 character substring are less than the chance to match the first 2 characters or any given 2 consecutive characters. If I would do the function, I will split it into 3 steps: 1) run match for the first and last char and store the result; 2) calculate the total sum of each string from the prev step; and 3) match the results with the correct sum with the searched string. This way all the mods can be skipped. Excuse me if this is a stupid idea but the 'like' function in Oracle practically doesn't work of a few billion records, so I think someone should definitely improve this...
@@pepi1442 I am curious to know why is the chance of matching the fist and last characters is different from matching on first and second (or any other pair for that matter)? Are you basing this on word patters of English language? I am think about this more from the standpoint of random strings, where the frequency of one character following some other one is the same.
The practical implementation aspect of this is an interesting discussion in its own right. Last time I checked the string.indexOf(...) function in Java doesn't do anything sophisticated at all. it literally does a bruit force char by char comparison and this is the default implementation of the language... By the way, the LIKE function in Oracle on a few billion records is slow probably for another reason - the problem there is having to fetch all of the rows from disk. If there is no index defined on the specific column, the query will end up doing a full table scan. So even if the string comparison function itself is fast, the whole process is still going to be very slow. So yeah, creating an index on that column usually solves the problem.
@@stablesort Some food for thought from the practical aspect :) I am trying to figure out how to describe it mathematically but I believe that in a text in any given language the chance to meet and 2 consecutive characters are higher than the chance to meet them separated with 100 characters. Imagine how many times you will meet let's say "ab" in a book and how many times you will have "a" and "b" separated by 100 characters. And using this approach, increasing the length of the search string doesn't bring any complexity, but on the contrary, the search will become much faster the longer the searched string is. The probability to meet "a" and "b" separated by 10000 characters will be way less than the probability to find them separated by 100 characters. string.indexOf(...) is simple as it is normally used on array values that are sent from the backend, eg. it is not supposed to run on large scale data. Indexing is not a panacea - it just creates a new table that is sorted in the indexed column(s). For searching a string it will have almost 0 effect if the column is indexed, you still need to do a full scan and match each string value one by one. This brings me to another plus of my solution - values that have smaller length the searched string will be discarded without having to calculate the mods. Fetching the rows from the disk is faster than matching the string, I can tell from practice that exact string match works super fast as opposed to the partial string match on the same large data. Anyway, thanks for the nice discussion.
That's an interesting idea and I suppose not that difficult to test - grab some big text file and do some number crunching.
You are right about the indexing - for some reason I was thinking that your query uses "... LIKE 'abc%' ..." , in which case the index does help since the beginning x number of characters is indexed. But, now that I think about it (having slept a little), you were discussing queries of the form ".... LIKE '%abc%' ...", in which case the index is indeed useless. From what I understand, this is still an unsolved problem in the industry - searching for some string in the middle of long text columns. Thanks for the food for thought =)
Not that kind of Hash