Rolling Hash Function Tutorial, used by Rabin-Karp String Searching Algorithm

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 136

  • @tanyakejriwal6020
    @tanyakejriwal6020 4 года назад +53

    After searching and reading through a lot of sources, this is the only video that explains every detail clearly! Really good explanation!! Thanks.

    • @stablesort
      @stablesort  4 года назад +1

      Thank you for such a wonderful compliment!

  • @TANEM315
    @TANEM315 3 года назад +28

    Easily top 5 most eloquent algorithm teachers on RUclips, thank you for the amazing lesson!!

    • @stablesort
      @stablesort  3 года назад

      Thank you!!! I'll do my best to keep it up

    • @robertmeis887
      @robertmeis887 2 года назад

      Agreed. Informative, very clear video, with a sprinkle of humor.

  • @illyafedyniak2947
    @illyafedyniak2947 3 года назад +8

    I searched for a long time for different materials on this algorithm, until I came across your video - it is the best explanation, thank you!

    • @stablesort
      @stablesort  3 года назад

      You are very welcome! And thanks for the good words!

  • @sreeram242
    @sreeram242 3 года назад +1

    Dude you killed this topic. No one explained this topic in detail other than you

    • @stablesort
      @stablesort  3 года назад

      Thanks! That was my primary motivation for making the video - having implemented the hash myself, I found a whole lot of details that aren't all that obvious and worth discussing.

  • @dashdoom8452
    @dashdoom8452 21 день назад

    This content deserves millions of subs, I hope you make more videos soon!

  • @joeellis7880
    @joeellis7880 3 года назад +3

    I rarely comment on videos, but this was an extremely well explanation so I had to write something. I hope you make more videos. Thanks for this!

  • @vaanibansal7621
    @vaanibansal7621 3 года назад +4

    This is one of the best videos I've seen to understand algorithms. Sir, thank you so much for this...and please upload more often!!

  • @moyo14
    @moyo14 Год назад +2

    You are so amazing! I was struggling with understanding Rabin kerp and rolling hash but now it’s very clear! Thank you so much!!!

  • @kushagrashekhawat8227
    @kushagrashekhawat8227 3 года назад +6

    I love your content and simple explanations, they are really helpful

    • @stablesort
      @stablesort  3 года назад

      Great! Thanks for the compliment!

  • @MieA123
    @MieA123 2 года назад +1

    Best video to include all details! Thanks a lot !

  • @hq_films
    @hq_films 2 года назад

    Super clear explanation to a complete beginner - was following MIT's course and was lost but this video completely breaks down the intuition behind why a rolling hash is computed this way! thanks

  • @josephantoniou3778
    @josephantoniou3778 3 года назад +5

    Really well articulated - thank you!

    • @stablesort
      @stablesort  3 года назад

      Thanks for the compliment! Cheers!

  • @amaama4140
    @amaama4140 5 месяцев назад

    Wow, that was marvelous. You are an outstanding algorithm teacher. You give people just the right amount of explanation necessary for understanding a concept. Your videos are not long and not short; they are just the perfect length. Well done on that, and please keep going. The community needs more content like this one.

  • @marcrautmann1826
    @marcrautmann1826 2 года назад +1

    good vid, you deserved this comment for YT algo

  • @erdeneboldbattulga6438
    @erdeneboldbattulga6438 3 года назад +3

    Thanks! I really liked your explanation!

    • @stablesort
      @stablesort  3 года назад

      Great! Thanks for the compliment!

  • @justinlam5565
    @justinlam5565 3 года назад +2

    Great video, definitely deserves more views.

    • @stablesort
      @stablesort  3 года назад

      Thanks for the compliment!

  • @34_harshdeepraghuwanshi98
    @34_harshdeepraghuwanshi98 3 года назад +1

    Your sanity check completely cleared my doubt thanks🇮🇳

    • @stablesort
      @stablesort  3 года назад

      Glad to hear it! Cheers!

  • @GANDHIXtv
    @GANDHIXtv 11 месяцев назад

    Best video on Rabin-Karp. Very nicely explained.

  • @piyushmishra2348
    @piyushmishra2348 4 года назад +3

    Great video...concise yet clear...looking forward to the KMP explanation video..cheers

    • @stablesort
      @stablesort  4 года назад +2

      Hi Piyush, I posted a video on KMP string search algorithm recently. In case you haven't seen it, here is the link: ruclips.net/video/8xWrNQOC1Ts/видео.html
      Let me know what you think!

    • @piyushmishra2348
      @piyushmishra2348 4 года назад +1

      Thanks for yet another awesome video. If possible, please consider making more videos on dynamic programming. Thanks!

    • @violentyev
      @violentyev 4 года назад

      @@piyushmishra2348 Thanks for feedback and the suggestion to make more DP related videos, I do appreciate it. I have a few dynamic programming topics planned, but if you have anything specific, just let me know! =)

  • @niharika6967
    @niharika6967 2 года назад +1

    very crisp explanation .

  • @LuizDahoraavida
    @LuizDahoraavida 3 года назад +2

    What a gem, very good video.

    • @stablesort
      @stablesort  3 года назад

      Thanks for the good words!

  • @michaelgirma8688
    @michaelgirma8688 Год назад +1

    Brilliant. Please keep making videos

  • @MuhidAbid14point75
    @MuhidAbid14point75 3 года назад +1

    Really needed this short

  • @natetrek
    @natetrek 4 года назад +2

    This is super useful! Great to know this algorithm. Looking forward to learning more!

    • @stablesort
      @stablesort  4 года назад

      Thanks for watching! I am glad you liked it!

  • @mielewis3893
    @mielewis3893 4 года назад +3

    Super video! Such a clear explanation and funny, too!

    • @stablesort
      @stablesort  4 года назад

      Thank you, Mie, for watching and for leaving a nice comment! I do appreciate it.

  • @ale-lx9gp
    @ale-lx9gp 3 года назад +1

    This is incredible! amazing work, to the point, funny, and informative!
    Please never stop making videos!

    • @stablesort
      @stablesort  3 года назад +1

      Thanks! I do have a bunch more videos to make from my to-do list. By the way, if you have topic suggests, please do let me know. Cheers!

    • @ale-lx9gp
      @ale-lx9gp 3 года назад +1

      @@stablesort I went on a Computer Science blitz for the past two days, and couldn't find a reliable explanation on Big numbers.
      How to represent big numbers, multiply them(Karatsuba Multiplication for instance), add, raise them and divide them would be a good video to consider!
      Another topics that I couldn't find good explanations on are 2-4 Trees, and B-trees.
      Thanks for your reply!

    • @stablesort
      @stablesort  3 года назад +1

      @@ale-lx9gp thanks for the tip on the lack of good tutorials on those subjects. The best part is that I am only vaguely familiar with them, so it'll get me to research and learn a bunch in the process. Much thanks!

    • @ale-lx9gp
      @ale-lx9gp 3 года назад

      @@stablesort My pleasure!

  • @leonardocalderon2365
    @leonardocalderon2365 2 года назад +1

    Great video! Thank you for your explanation!

    • @stablesort
      @stablesort  2 года назад

      Thanks for leaving a compliment!

  • @kevinsmirnov264
    @kevinsmirnov264 3 года назад +1

    Yes, pleeeaaaase! That's how it's done.
    thank u very much

    • @stablesort
      @stablesort  3 года назад

      Glad to see your liked the video =)

  • @someaknea
    @someaknea 2 года назад +1

    Great explanation , I like this video

  • @adhishmalviya2408
    @adhishmalviya2408 4 года назад +20

    1 disliker is Brute Force Algorithm.

    • @stablesort
      @stablesort  4 года назад +1

      =) I'd add KMP to that: ruclips.net/video/EL4ZbRF587g/видео.html

    • @y.violetguo5305
      @y.violetguo5305 3 года назад +1

      @@stablesort dislikers came here for the other kind of rolling hash 🤣 Your tutorials are amazing!!

    • @stablesort
      @stablesort  3 года назад

      @@y.violetguo5305 =) Haha, thanks for the compliment!

  • @archiefayzullaev4585
    @archiefayzullaev4585 3 года назад +1

    Great Explanation! Thank you! I hope you will keep working on the channel !

    • @stablesort
      @stablesort  3 года назад +1

      Thanks for the compliment! Glad to hear that it was helpful. I'll be putting up more episodes soon =)

  • @adityaaditya7286
    @adityaaditya7286 Год назад +1

    Best explanation

  • @orzveggie337
    @orzveggie337 4 года назад +4

    Great! An useful algorithm.

  • @uselesvideo
    @uselesvideo 4 года назад +2

    Thank you for the awesome video !! :)

    • @stablesort
      @stablesort  4 года назад

      New episodes are on the way. Stay tuned! By the way, if there is a topic that you'd like me to cover, please do let me know. Thanks for watching!

  • @yagamiram
    @yagamiram 3 года назад +1

    Excellent Video! Thanks.

    • @stablesort
      @stablesort  3 года назад

      Thanks! Glad to hear that you enjoyed it!

    • @yagamiram
      @yagamiram 3 года назад +1

      @@stablesort What happens when the hash value becomes negative when applying the rolling hash principle? Do we have to add the prime number (here 113) to it? Correct me If I'm wrong.

    • @stablesort
      @stablesort  3 года назад +1

      @@yagamiram Good question. You are right, you could end up with a negative number. However, as weird as it sounds, it's actually OK to take a MOD of a negative number. Negative numbers map to positive ones in modular arithmetic. For example, -2 in mod 10 is the same as +8 in mod 10, or +18 mod 10. So it's just like going counter-clockwise on the dial of numbers that goes from 0...9. So you could keep it as a negative number or, as you mentioned, you could convert it to a positive number if so desired.

  • @annawolarz7136
    @annawolarz7136 2 года назад

    Very good explanation. Simple and to the point!

  • @adityagoyal50
    @adityagoyal50 4 года назад +1

    Thank you sir.. It was very clear and concise explanation.

  • @user-tb4bs9po9b
    @user-tb4bs9po9b 4 года назад +2

    why can't you be my professor, you are so much better at explaining things

    • @stablesort
      @stablesort  4 года назад

      =) thanks for a wonderful compliment!

  • @HarshaNadendla
    @HarshaNadendla 2 года назад +1

    Thanks for the video sir.

  • @OverNothingbupt
    @OverNothingbupt 2 года назад +1

    amazing clear video, love u

  • @shashankagarwal9301
    @shashankagarwal9301 2 года назад +1

    Thank you so much!

  • @windsonyang1441
    @windsonyang1441 4 года назад +1

    Great video. It would be better if you can talk about how to choose the base and the module number and why.

    • @stablesort
      @stablesort  4 года назад

      That's a fair point. The challenge for me is balancing the length of the video with the amount of interesting tangential information to include.
      Let me direct you to this post that you might find interesting as it answers the questions "Why should hash functions use a prime number modulus?"
      stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus
      There is a pretty good explanation that talks about hash values being used as keys in hash tables, which have some fixed number of "buckets". You want your hash function to generate keys that distribute as uniformly as possible over the buckets of the hash table, no matter what the input is to the hash function. Hence the use of prime numbers, so as to avoid common multiples between buckets and keys. Further reading:
      cs.stackexchange.com/questions/11029/why-is-it-best-to-use-a-prime-number-as-a-mod-in-a-hashing-function/64191#64191

  • @cookiemonster3699
    @cookiemonster3699 2 года назад +2

    Q: Why was the horse anxious?
    A: It was going through a stable sort.
    (Lame) jokes aside, thanks for the clear explanation!

  • @yotubecreators47
    @yotubecreators47 Год назад

    Thank you very much I really appreciate it.

  • @miladiouss
    @miladiouss 3 года назад +2

    I love the intro 😂🤣😂

  • @go_better
    @go_better 4 года назад +2

    Thanks. Interesting and funny :)

  • @krzysztofstrojny7546
    @krzysztofstrojny7546 Год назад

    The thing I lack in this video is justification why KNP is performing better. Other than that, great video

  • @wrutzreaction7199
    @wrutzreaction7199 4 года назад

    wonderful explanation sir! thanks a lot

    • @stablesort
      @stablesort  4 года назад

      Thanks for the compliment! Please spread the good word =)

  • @c1q3
    @c1q3 2 года назад +1

    good vid.

  • @shawnfrank5303
    @shawnfrank5303 2 года назад +1

    Very nice explanation. I had a question at minute 6 when you explaining about using big numbers. You talk about (10^99) % 113 which works great, however this does not seem to match the initial formula you gave. For example (10^99 + 10^98 + 10^96 ..... 10^0) % 113. Your initial hash formula applies mod only at the end after summation and so we still need to hold the value of a really large number ? And if you say, we change the function to apply the mod at each step before summation, then wouldn't the rolling function also change ? Thanks in advance.

  • @AI7KTD
    @AI7KTD 4 года назад +2

    You know, you got an off by one in the name! It's Rabin-Karp (or Karp-Rabin) named after Michael Rabin and Richard Karp.

    • @stablesort
      @stablesort  4 года назад +1

      Thanks for pointing out the spelling mistake. Guilty as charged.

  • @abhishekavr1848
    @abhishekavr1848 4 года назад +1

    I subscribed for more such useful video.

  • @luyang7033
    @luyang7033 2 года назад

    Great explanation on Rabin-Karp algorithm. One question I have is that the modulo multiplication for very large numbers, for example, 10^99 as in your video at 6:51. The number of operations for one loop of modulo multiplication is 99, which is the same as the pattern length. In this way, the time complexity will be O (m*n) with larger constants than the brute force method. This makes using rolling hash pretty useless. Maybe we should sought to other hash functions when the pattern length is too long?

  • @manikandansritharan5501
    @manikandansritharan5501 3 года назад +1

    Thanks! That helped!

  • @javeedshareef4795
    @javeedshareef4795 4 года назад +2

    Nice video...!!
    l like the way u explain concep..

    • @stablesort
      @stablesort  4 года назад

      Thanks for the compliment! I am glad to hear that you enjoyed it!

  • @flamayo
    @flamayo 3 года назад +2

    Thanks for the great video, helped me out a bunch. I did have one issue, though:
    The term "h(xyz...) - x*B^p mod(Q)" (seen at 6:00, as "h(abcd) - 1*10^3 mod(113)") can sometimes be negative, which ruins the hash. I struggled with getting it to work until I found the trick elsewhere online of adding the mod number in the same term:
    "h(xyz...) + Q - x*B^p mod(Q)"
    Is there a reason this wasn't addressed in the video? I'm wondering if maybe it IS addressed but I just missed that part of the video, or if there's a better solution that the creator of the video had in mind. Thank you.

    • @stablesort
      @stablesort  3 года назад +3

      Good question. Trying to keep the video short is the main reason why I didn't touch on that subject. But you are right, you could end up with a negative number. However, does that actually "ruins" the hash? =) As weird as it sounds, it's actually OK to take a MOD of a negative number. Negative numbers map to positive ones in modular arithmetic. For example, -2 in mod 10 is the same as +8 in mod 10, or +18 mod 10. So it's just like going counter-clockwise on the dial of numbers that goes from 0...9. So you could keep it as a negative number or, as you mentioned, you could convert it to a positive number is so desired.

    • @flamayo
      @flamayo 3 года назад +1

      @@stablesort Wow, thanks for the reply! Okay, I think I understand now. And so for a language like Python it will keep the mod positive automatically, but for Java I would have to do that myself if that is desired. Thanks again!

  • @amanuel2135
    @amanuel2135 Год назад

    Your amazing.

  • @FirefoxGuy18
    @FirefoxGuy18 3 года назад

    Amazing!

  • @one-hp
    @one-hp 4 года назад +1

    Now I understand thanx

    • @stablesort
      @stablesort  4 года назад

      Thanks posting a compliment!

  • @krgaurav8551
    @krgaurav8551 4 года назад +1

    Awesome

  • @zuowang8819
    @zuowang8819 3 года назад +1

    is there a good explanation of choosing 113 as the prime not 101?

    • @stablesort
      @stablesort  3 года назад

      Thanks for posting a question. Either is just fine. I happen to have picked 113. By the way, in "real world" implementation you should probably pick a larger prime, perhaps 5 or 6 digits long, depending on the data set.

  • @coltennabers634
    @coltennabers634 2 года назад

    6:49 - why is this for loop equivalent to 10^99 mod 113 and what is the role of n?

  • @albert_chen
    @albert_chen 4 года назад +1

    I went from not understanding to not understanding. But that's not ur fault. I'm just not at this level yet. Do you have any suggestion of an online course or roadmap that covers these kind of algorithms that I can build my knowledge on?

    • @stablesort
      @stablesort  4 года назад

      Good question... How much programming (and schooling) experience have you had so far? By the way, I'd guess that the intricacies of hash function implementations are probably not talked about until at least the 2nd year of college CS undergraduate courses.

    • @albert_chen
      @albert_chen 4 года назад

      Stable Sort I’m been doing for about 4 years. I have finished the AP CS A curriculum and want to dip my toes into some more complicated algorithms and competitive coding. I’m also going through rob Edward’s data structure playlist on RUclips right now since we’re all in quarantine.

    • @stablesort
      @stablesort  4 года назад +1

      @@albert_chen I took a quick look at Rob Edward's channel and it looks pretty good. I don't know if there is a channel that comprehensively teaches more complicated algorithms. When I was taking the algorithms course in college, the book we used was "Introduction to Algorithms"
      www.amazon.com/Introduction-Algorithms-Press-Thomas-Cormen-ebook/dp/B007CNRCAO/ref=sr_1_4
      which I still reference every now and then. As far as competitive coding, I'd say just practice on websites like spoj.com

    • @albert_chen
      @albert_chen 4 года назад

      Stable Sort thanks for the reply, will do.

    • @cookiemonster3699
      @cookiemonster3699 2 года назад

      Very good question. Thankful for the answers given too.

  • @timothyebiuwhe5209
    @timothyebiuwhe5209 3 года назад +1

    lol. "always roll your hash responsibly"

  • @ZobiaRasheed
    @ZobiaRasheed Год назад

    Can you please provide the slides?

  • @erikviking471
    @erikviking471 4 года назад +1

    That was more clearly stated than a hungover Nancy Pelosi could have done, even if her denture adhesive were working.
    Good job Andre! Nicely edited, no hemming and hawing, moved right along, clear graphics, points not belabored...
    I still think a vid about hashish rolling might be interesting though :)

    • @stablesort
      @stablesort  4 года назад

      Ha! There is just no getting away from politics... On that note, this is now to flip electoral districts in O(log N) running time complexity: ruclips.net/video/oAR_EYd8im0/видео.html

  • @bismeetsingh352
    @bismeetsingh352 3 года назад

    Hi, is the code available?

    • @stablesort
      @stablesort  3 года назад

      Sorry for the late reply. For most of my videos I usually do implement the full algorithm before making the video itself. But in this case I was too lazy to do it. But may be try this one:
      github.com/lemire/rollinghashjava
      (full disclosure - that is not my code, it's just the top result from a google search)

  • @akashtyagi7182
    @akashtyagi7182 4 года назад

    Hash responsibly !

  • @pepi1442
    @pepi1442 4 года назад

    Why not simply use the first and last character and check the rest only if there is a match on the first and the last?? Matching the entire 100 chars is stupid - there can be too many combinations with the same end value!

    • @stablesort
      @stablesort  4 года назад

      Matching the first and last character before moving on to others is really no different from matching the first and second character before moving on to others =) The benefit of computing the rolling hash for a given window is that the cost of doing it is fixed, regardless of the size of the window. In effect, allowing you to compare all of the characters in the window at once. Of course, in practical implementations, there is small chance of collision, which you then handle separately. But still, the chance of the collision is much much lower than the chance of two characters matching up (be it the first and last or whatever pair you choose).

    • @pepi1442
      @pepi1442 4 года назад

      @@stablesort Thank you for the detailed explanation. But I think statistically the chance to match the first and the last characters of a 100 character substring are less than the chance to match the first 2 characters or any given 2 consecutive characters. If I would do the function, I will split it into 3 steps: 1) run match for the first and last char and store the result; 2) calculate the total sum of each string from the prev step; and 3) match the results with the correct sum with the searched string. This way all the mods can be skipped. Excuse me if this is a stupid idea but the 'like' function in Oracle practically doesn't work of a few billion records, so I think someone should definitely improve this...

    • @stablesort
      @stablesort  4 года назад

      @@pepi1442 I am curious to know why is the chance of matching the fist and last characters is different from matching on first and second (or any other pair for that matter)? Are you basing this on word patters of English language? I am think about this more from the standpoint of random strings, where the frequency of one character following some other one is the same.
      The practical implementation aspect of this is an interesting discussion in its own right. Last time I checked the string.indexOf(...) function in Java doesn't do anything sophisticated at all. it literally does a bruit force char by char comparison and this is the default implementation of the language... By the way, the LIKE function in Oracle on a few billion records is slow probably for another reason - the problem there is having to fetch all of the rows from disk. If there is no index defined on the specific column, the query will end up doing a full table scan. So even if the string comparison function itself is fast, the whole process is still going to be very slow. So yeah, creating an index on that column usually solves the problem.

    • @pepi1442
      @pepi1442 4 года назад

      @@stablesort Some food for thought from the practical aspect :) I am trying to figure out how to describe it mathematically but I believe that in a text in any given language the chance to meet and 2 consecutive characters are higher than the chance to meet them separated with 100 characters. Imagine how many times you will meet let's say "ab" in a book and how many times you will have "a" and "b" separated by 100 characters. And using this approach, increasing the length of the search string doesn't bring any complexity, but on the contrary, the search will become much faster the longer the searched string is. The probability to meet "a" and "b" separated by 10000 characters will be way less than the probability to find them separated by 100 characters. string.indexOf(...) is simple as it is normally used on array values that are sent from the backend, eg. it is not supposed to run on large scale data. Indexing is not a panacea - it just creates a new table that is sorted in the indexed column(s). For searching a string it will have almost 0 effect if the column is indexed, you still need to do a full scan and match each string value one by one. This brings me to another plus of my solution - values that have smaller length the searched string will be discarded without having to calculate the mods. Fetching the rows from the disk is faster than matching the string, I can tell from practice that exact string match works super fast as opposed to the partial string match on the same large data. Anyway, thanks for the nice discussion.

    • @stablesort
      @stablesort  4 года назад

      That's an interesting idea and I suppose not that difficult to test - grab some big text file and do some number crunching.
      You are right about the indexing - for some reason I was thinking that your query uses "... LIKE 'abc%' ..." , in which case the index does help since the beginning x number of characters is indexed. But, now that I think about it (having slept a little), you were discussing queries of the form ".... LIKE '%abc%' ...", in which case the index is indeed useless. From what I understand, this is still an unsolved problem in the industry - searching for some string in the middle of long text columns. Thanks for the food for thought =)

  • @himachal8252
    @himachal8252 2 месяца назад

    Not that kind of Hash