Rabin Karp Substring Search Pattern Matching

Поделиться
HTML-код
  • Опубликовано: 29 сен 2024
  • / tusharroy25
    github.com/mis...
    github.com/mis...
    github.com/mis...
    In computer science, the Rabin-Karp algorithm or Karp-Rabin algorithm is a string searching algorithm created by Richard M. Karp and Michael O. Rabin (1987) that uses hashing to find any one of a set of pattern strings in a text. For text of length n and p patterns of combined length m, its average and best case running time is O(n+m) in space O(p), but its worst-case time is O(nm). In contrast, the Aho-Corasick string matching algorithm has asymptotic worst-time complexity O(n+m) in space O(m).

Комментарии • 248

  • @chiragjindal6096
    @chiragjindal6096 3 года назад +27

    Paying 4 LPA college fees just to end up watching your free video lectures.

  • @sagarshah275
    @sagarshah275 8 лет назад

    Very nice explanation. Thanks!

  • @riverland0072
    @riverland0072 7 лет назад +26

    I think this is wrong. You are not doing the required modular arithmetic to reduce the false positives. The method will work but this is not the complete algorithm. Someone should prove me wrong on this.

    • @aneeshcrao
      @aneeshcrao 6 лет назад +2

      Yeah clearly, it is incomplete.

    • @aneeshcrao
      @aneeshcrao 6 лет назад +4

      He has not used to the table size of the hash (the prime number here) anywhere. You will need to find the modulo of the final hash value, otherwise you will get strings which hash to values greater then the prime number, which clearly won't fit in the Hash Table

    • @adhishmalviya2408
      @adhishmalviya2408 4 года назад

      Even 3 years after your comment, I am still with you.

    • @YashArora721
      @YashArora721 4 года назад

      Same here..this does not work for large numbers..the has becomes 0 for large numbers . There should be some modulo operation wrt to prime numbers

    • @harshdeepsinghful
      @harshdeepsinghful 3 года назад

      Yes this is incomplete....

  • @kamal-xd7id
    @kamal-xd7id 6 лет назад

    For bigger string, Hashcodes are going very big. Should we not use prime number as modulo also?

  • @ObomXD
    @ObomXD 8 лет назад

    Awesome video as always, I think I got everything, but how should I proceed when the pattern have like 10000 letters and the hash values can go very high?

    • @zhichaoh
      @zhichaoh 8 лет назад

      +Tushar Roy it is common to mod a prime to make the number bounded. Then, it is not an issue any more.

    • @ObomXD
      @ObomXD 8 лет назад

      +zhichaoh how this would work then? i mean to keep the rolling hash consistent and avoid some kind of collision

  • @drummatick
    @drummatick 6 лет назад

    the time complexity is wrong. It is O(m+n), because i only have to compute the hash O(n) times as i iterate and for only once i have to calculate the hash of the pattern, so kindly update the video

    • @geni3595
      @geni3595 4 года назад

      o(m+n-1 ) cause need not check the last character in pattern if the previous characters and hash match

  • @Suresh-Vuppala
    @Suresh-Vuppala 8 лет назад

    waiting 4 boyer moore alg..

  • @boredengineer2089
    @boredengineer2089 8 лет назад

    idk but I'm not satisfied although nice try tysm for uploading

  • @reehji
    @reehji 8 лет назад +59

    damn, this is 10x easier than trying to read the lemmas and proofs T_T

    • @kumartatsat868
      @kumartatsat868 3 года назад +4

      videos like these are good for knowing "how" things work. but proofs and lemmas help you to answer "why" this thing works

    • @JTST1234
      @JTST1234 3 года назад +1

      @@kumartatsat868 yea but you start ask knowing "how" things work then move on to "why" things work

  • @babajogiji
    @babajogiji 8 лет назад +15

    I didn't realize the power of rolling hash calculation unless applied it to longer text and patterns! Your 3-step calc is very good idea.

  • @duydangdroid
    @duydangdroid 7 лет назад +11

    understood perfectly after first example. May Allah reward you greatly for the work that you have done.

  • @raghavkhurana7110
    @raghavkhurana7110 8 лет назад +41

    You're good. Keep helping us out when our professors won't and god will bless you!

  • @kalaiandev5356
    @kalaiandev5356 8 лет назад +10

    I don't understand why are we using prime numbers here....Why can't we simply add their ASCII values

    • @ramansonipat
      @ramansonipat 8 лет назад +31

      Consider values of A=1, B=2, C=3
      If we don't use prime numbers the hash value for ABC and BCA are
      ABC = 1+2+3=6
      BCA = 2+3+1=6
      Hash values are colliding causing the search to be inefficient.Instead if you used prime numbersABC = 1 * 3^0 + 2 * 3^1 + 3 * 3^2 = 1+6+27 = 34 BCA = 2 * 3^0 + 3 * 3^1 + 1 * 3^2 = 2+9+9 = 20Hash values are different and can be differentiated even if they contain same number of characters.

    • @malharjajoo7393
      @malharjajoo7393 7 лет назад +1

      Does this mean that there can never be hash collisions ?? Also, how can you prove that ?

    • @Deepakyadav-jr3gk
      @Deepakyadav-jr3gk 7 лет назад +3

      No, it does not guarantee that hash will never collide.
      If hash collides then we will check element by element and if it matches then only pattern is find otherwise not.
      This is the worst case if our hash function is not good and hase values colliding each other and indeed checking for each character and pattern is not found.
      o((n-m+1)*m ).

    • @satadhi
      @satadhi 7 лет назад +1

      then how do you decide which hash function is better suited ?

    • @geni3595
      @geni3595 4 года назад

      @@satadhi
      Generally multiplying with a large number ensures hashes won't collide but for understanding purpose it's advisable to take a small value number

  • @SonuSonu-tk5pk
    @SonuSonu-tk5pk 7 лет назад +22

    ""Hello friends my name is..."
    its amazing

  • @kisgatyas
    @kisgatyas 8 лет назад +39

    Very good explanation of the topic. Thank you for your effort. :)

  • @amansinghrajput9445
    @amansinghrajput9445 Год назад +1

    Sir you very much look like a famous RUclipsr (THUGESH)

  • @mrfrozen97-despicable
    @mrfrozen97-despicable 3 года назад +1

    Why don't u use number of ascii values instead of a prime no.

  • @carloslaguna7036
    @carloslaguna7036 8 лет назад +3

    This is the first time I understand rolling hash functions. Great video! Keep up with the great work!

  • @sidhantakumarpattnaik2962
    @sidhantakumarpattnaik2962 8 лет назад +1

    Very nice and so simple explanation .somebody who not belong to software industry will also understand easily . keep posting these videos this really helps us .

  • @SearchingPeaceInside
    @SearchingPeaceInside 4 года назад +1

    Hey tushar, as per your algo we need to store mappings a=1,b=2,c=3 and so on. if input text contains 256 ASCII characters, maintaining mapping would be tough, I believe instead taking actual ASCII values of characters in pattern would be helpful. Again you have not told on what basis you have taken prime number(3), this is similar to converting string to base-3 number system. And in case of huge pattern string also if you take prime number as 101(as you told) it will make a huge number might overflow unsigned int boundaries as well. I believe using mod might be good. Also you did not told why 101 would be good, ie synonymous to converting number to base-101 number system. i belive taking smaller prime will lead to larger hash collisions and then lot of comparisons..

  • @anerdwitdacamera204
    @anerdwitdacamera204 4 года назад +1

    what if the substring length is like 100, the hash will exceed the integer even long limit

  • @АндрейИльин-д6ж
    @АндрейИльин-д6ж 3 года назад +1

    подарите ему кто нибудь компьютер

  • @KanagaveluSugumar
    @KanagaveluSugumar 8 лет назад +2

    Excellent Tushar! Now got the Rolling hash function :) And it is great that you are explaining about the real time application where it is used and its time complexities. It will also be great if you do explain about where it will fails as well as comparing with other equivalent algorithms which solve those failures at the end of the session.

  • @ankurmazumder5590
    @ankurmazumder5590 6 лет назад +1

    can you plz make a video on aho corasick algorithm? i think its much more important than other multiple string searing algorithm as it takes linear time

  • @saadullah1797
    @saadullah1797 7 лет назад +1

    how to choose the prime number if the prime number is not given?

  • @vaibhavchhabra800
    @vaibhavchhabra800 8 лет назад +2

    +Tushar Roy
    Can u make a tutorial on Aho Corasick String Matching Algorithm Plz

  • @putriaulianoer6047
    @putriaulianoer6047 8 лет назад +1

    thank you sir. i understand. nice explaination.
    teach us optimal mismatch algorithm, please sir .

  • @pralaynath447
    @pralaynath447 7 месяцев назад

    isnt this approach will create overflow problem while calculating HASH ?? if pattern is more than 63 length.that to in best case . if we choose our prime 2. how can we compute 2^64 . looks like this approach has limitation , we should be using MOD also

  • @kishanmittal431
    @kishanmittal431 4 года назад

    what if the string to be searched is too big ?
    lets say if the string is of size 10^5 and the string to be matched(present or not) is of size 10^4 then how will u calculate the hash ? pow(3,10^4) is overflowing here , so this algo doesn''t work when the string is too big ?

  • @tejasdonadkar9094
    @tejasdonadkar9094 3 года назад +1

    Thank you so much

  • @MrMrVV
    @MrMrVV 5 лет назад

    The implementation on Wikipedia is different, it seems there is another variation of Rabin Karp algorithm used by other sources like "www.geeksforgeeks.org/rabin-karp-algorithm-for-pattern-searching/". Can anybody explain why other sources use the base for calculation?

  • @gsb22
    @gsb22 3 года назад

    I'm surprised that no one really checked that this algo will break if use value 3. Start with 4 and it works, with 3 it breaks.

  • @yaaimcoder
    @yaaimcoder 2 месяца назад

    Should have commented on the uniqueness of rolling hash bruhhhh

  • @amoghasoda
    @amoghasoda 3 года назад

    Hi Tushar, great tutorial. But you are not considering overflow. So your code is wrong! Please correct it.

  • @henrydeutsch5130
    @henrydeutsch5130 2 месяца назад

    Phenomenal tutorial, very clear. Thanks!

  • @sohaibdanish246
    @sohaibdanish246 8 лет назад

    You have Nice Videos. Make videos more simple. It got confusing.

  • @codearabawy
    @codearabawy 4 года назад

    I didn't get why we need to calculate the hash if I can go like a sliding window and compare without hashing? anyone can help me understand this point, please.

  • @gollasaivenkatesh
    @gollasaivenkatesh 3 года назад

    your explaination was awesome but it gave a tle in leetcode for implement strStr() function

  • @_-6912
    @_-6912 3 года назад

    I do not understand the need of this algorithm, cant we just compare the characters instead of hash comparison and then character comparison?

  • @naveennoel9496
    @naveennoel9496 3 года назад +1

    The rolling in and out part has been flawlessly explained. 👌 Thanks!

  • @sauravjha5666
    @sauravjha5666 4 года назад

    but if we have to calculate hash for a very long pattern then we won't be able to calculate power of 101, what should i do then?

  • @aayush5474
    @aayush5474 3 года назад

    The code will not work for large length of pattern. Overflow condition needs to be addressed

  • @ivandrofly
    @ivandrofly 8 лет назад +2

    This algorithm can be optimized to θ(n-m+1)

  • @ImmigrantLife
    @ImmigrantLife 5 лет назад

    Wow, that is a great explanation. But how in the world can we come up with those 3 steps to get the rolling hash. That's the key algo in this process. I'm afraid this is like - "if you do not know, you never will know." My advice to everyone is that just keeps yourself acquainted with all the formulae. Someday, this logic might strike your mind and voila! Thank you for this video.

  • @himanshu1689
    @himanshu1689 7 лет назад

    Hello tushar I appreciate your got work and effort.Please help me out if rabin carp can search for exact pattern or not. Suppose i am searching over a text file for the word "the" to calculate the frequency but it count the sub-string with the such as they them their. What logic can be added in the rabin karp so that it can go for exact match.

  • @pythoniccypress6715
    @pythoniccypress6715 6 лет назад

    Code seems not working with large input such as ["aabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaa","baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"].
    Both of them get the same initial hash, which is weird..

  • @hafsaafreen5511
    @hafsaafreen5511 4 года назад

    Please make a video on boyer moore algorithm.

  • @venkataamareshiragavarapu900
    @venkataamareshiragavarapu900 8 лет назад

    Hello... When we use prime=101 and raise it power, say, 14, multiply with ASCII values of characters,wont it cross the range of long int ? Cant we just keep adding ASCII values of characters??

  • @JustmeAgainOk
    @JustmeAgainOk 7 лет назад

    Thanks for the vPost , just want to drop that rabin fingerprint uses the rolling hash function other way around to prevent division. e.g
    initialHash=c1*a(k-1)+c2*a(k-2)+...ck*a(0)
    ,en.wikipedia.org/wiki/Rabin_fingerprint

  • @xiaoweicheng2118
    @xiaoweicheng2118 4 года назад

    why not keep a deque to do the calculatioin?

  • @oshogarg5215
    @oshogarg5215 6 лет назад +1

    Plz improve Audio quality

  • @RohitKumar-uj4ho
    @RohitKumar-uj4ho 7 лет назад

    why to double chceck on line Number 25, if(patrnHash== textHash && check equal(,int,int,int,int))
    its like same we are using without hash
    I think checkEqual function is useless
    is tha ryt?

    • @paulrybitskyi1737
      @paulrybitskyi1737 7 лет назад

      No. The text hash and pattern hash can sometimes be identical even though their string representations may be completely different. That's why it's recommended to use a good hashing function to avoid these sort of things.

  • @NoobSaibotVII
    @NoobSaibotVII 7 лет назад

    I am being naive, but if you have to compare strings to strings anyway regardless of the hash, then why even bother hashing? To someone being naive it looks like you can just compare the substring to the sliding window string that you use to make the hash.
    i.e. 'abc' == 'abc', not (h('abc') == h('abc') && 'abc' == 'abc')

    • @sunnysam69
      @sunnysam69 7 лет назад

      Comparing strings can become intensive if the size of the pattern is very long. The complexity becomes O(m*n) in worst case.
      Instead, in Rabin-Karp method, you just have to do one comparison of hashes. Only if the hashes are same, the actual substrings are compared (to avoid hash collisions) The computation of hashes is not that intensive when compared to comparison of pattern and text, since you will be getting the current value of hash using previous value.

  • @salmanraza6362
    @salmanraza6362 8 лет назад

    I tried hard but couldn't find your video for Brute Force string matching . Could you please give me the link here.
    Have an exam on Sunday. Would really appreciate it.

  • @samchan1043
    @samchan1043 7 лет назад +1

    Very good video,thanks bro

  • @janmichaelaustria620
    @janmichaelaustria620 3 года назад

    Although this is O(mn), is it amortized O(m+n)? In expectation, we hope the hash function is good and does not always result in a collision for each substring?

  • @cantwaittowatch
    @cantwaittowatch 8 лет назад

    Thanks Tushar. If I understand this right - Brute-force time complexity is the same as this approach. So this algo excels on looks ups based on hashbased, and when hash calculation maps to collisions then this algo is no good. So do you think brute-force is better given the time complexities are same?

  • @KanagaveluSugumar
    @KanagaveluSugumar 8 лет назад

    Could you please create a playlist on Searching algorithams? It is difficult to find other searching algo tutorials by you ?

  • @somnathgarai
    @somnathgarai 8 лет назад

    The explanation was really cool.. however I lost focus when it came to the code explanation. It would be nice if you could share the code with comments explaining the methods and its working.

  • @ujking7970
    @ujking7970 5 лет назад

    Why your name is tiyuushaawwr..?

  • @seemapradhan837
    @seemapradhan837 8 лет назад

    The hash value of each substring in the base can be calculated very quickly if we use a hashing algorithm that allows us to remove the character we are discarding and add the new one. " This line is from my lecture note. I failed to understand what it means by discarding and adding and there's not much explanation for this part too. Have exam today :/

  • @DcCoO
    @DcCoO 8 лет назад

    i think you could explain what happen with big strings that brings big hash values (using modulus, etc)

  • @93744020123
    @93744020123 8 лет назад

    If the value of "m"(length of pattern) is big enough.The hash value of pattern is very big it may go out of range then what should we do....:(

    • @93744020123
      @93744020123 8 лет назад

      okay got you.Thnkz for solving the doubt..:)

  • @eequalsmc2829
    @eequalsmc2829 5 лет назад

    Could you also provide links for videos that you already made like KMP in description

  • @endersuu8359
    @endersuu8359 3 года назад

    It is quite clear, but what if the string to be matched is really long? Hash value will be huge. Is there a way to optimize?

  • @dipendrasingh3559
    @dipendrasingh3559 4 года назад

    r u still uploading videos
    sir tushar

  • @niravbharadiya2901
    @niravbharadiya2901 7 лет назад

    sir, you ain't using '%' operator to limit size of hash, so in what way you are hashing the strings to some key value, rather we can say you are attaching a numeric value for making comparison easy!?

  • @murisatyasandeepreddy1195
    @murisatyasandeepreddy1195 6 лет назад

    Can you please upload naive string search and brute force search algorithm

  • @高航-o3x
    @高航-o3x 8 лет назад

    There is a question, if the text string and the pattern string are consisted of Chinese characters and the two strings are longer a little, the hashcode may overflow so wrong results appear.As we known,Chinese characters have a large number of characters。

    • @高航-o3x
      @高航-o3x 8 лет назад

      ***** The String class in Java has a method String.indexOf(), it just use the BF rather than KMP.

  • @osamaalkhodairy3155
    @osamaalkhodairy3155 7 лет назад

    It's very simple,
    is this work ?!
    string x, y;
    int j = 0;
    for(int i = 0 ; i < x.size() ; i++){
    if(y[j] == x[i]) j++;
    else j = 0;
    if(j==y.size()-1){
    cout

  • @hannanathar3627
    @hannanathar3627 9 месяцев назад

    you are not testing at 0th index of string for pattern matching and why didn't you use modulus?

    • @pralaynath447
      @pralaynath447 7 месяцев назад

      yes , i also commented this
      "isnt this approach will create overflow problem while calculating HASH ?? if pattern is more than 63 length.that to in best case . if we choose our prime 2. how can we compute 2^64 . looks like this approach has limitation , we should be using MOD also"

  • @RollsRoyce2035
    @RollsRoyce2035 5 лет назад

    Tushar, Can you help with 2D pattern searching using Rabin Karp. TIA

  • @yashwant4129
    @yashwant4129 2 года назад

    best video on this algortihm

  • @letsghoomo
    @letsghoomo 8 лет назад

    Really detailed explanation which is understandable for all levels of students..Thank you

  • @poojagarg5505
    @poojagarg5505 7 лет назад

    Thankew so much your all videos help me alottttt.. You teach very well.. Thanx again sir

  • @halseywalker5015
    @halseywalker5015 Месяц назад

    You are the real hero !!!

  • @simonvutov7575
    @simonvutov7575 7 месяцев назад

    mate youre actually the goat

  • @NagabhushanBaddi
    @NagabhushanBaddi 8 лет назад

    Please make more videos on String algorithms....thanks for the great videos....

  • @tomchu6100
    @tomchu6100 2 года назад

    Good job! Thanks for your insights, I just have only one question, if the length of the pattern is getting really long, like more than 20 characters, then the hash value will become intolerable for most types.

  • @SEVERANCE850
    @SEVERANCE850 6 лет назад

    Your Time complexity i think so wrong it should be O(m)+O(n)

  • @akhilmaddu7752
    @akhilmaddu7752 6 лет назад

    if the hash value is not divisible by prime number what should we do?

  • @ankushtiwari9645
    @ankushtiwari9645 8 лет назад

    Tushar what if the pattern and text are numbers? How would we calculate the hash of it?

    • @ankushtiwari9645
      @ankushtiwari9645 8 лет назад

      +Tushar Roy won't it make the calculations lengthy and what if we get decimal values after dividing the prime numbers?

  • @nandkishorenangre8244
    @nandkishorenangre8244 6 лет назад

    great video at least better than what my sir taught ... _/\_...

  • @jishudohare9141
    @jishudohare9141 6 лет назад

    Awesome video but one doubt
    what if we have to find multiple patterns and length of those multiple patterns vary?

  • @lerneninverschiedenenforme7513
    @lerneninverschiedenenforme7513 7 лет назад

    Thank you understandable english! Thank you formsharing that knowledge! Helped me a lot!

  • @dron466
    @dron466 7 лет назад

    Hey Tushar please can you give me hint how I can make changes in your hash function to allow for one or zero mismatch.

  • @sitadevimuthkhod1926
    @sitadevimuthkhod1926 6 лет назад

    Is the prime value always 3? Or is it the length of pattern to be matched?

  • @ankurbmv
    @ankurbmv 8 лет назад

    Even from a newbie perspective its easy-to-understand your videos, great initiative, thank you very much.

  • @prashantkiran6869
    @prashantkiran6869 8 лет назад

    Very nice explanation.Please keep up the good work!

  • @DinGDonGvideoProduction
    @DinGDonGvideoProduction 6 лет назад

    nevr love to wtch your video bcz f your englsh

  • @PaAGadirajuSanjayVarma
    @PaAGadirajuSanjayVarma 4 года назад

    What do we say to the God of algorithms teaching
    Thank you tushur roy

  • @neerajchoudhary3709
    @neerajchoudhary3709 2 года назад

    this channel is quality

  • @anandzutshi357
    @anandzutshi357 8 лет назад

    Please upload video and explanation for finite automata and Aho-Corasick also.
    It will be very helpful.

  • @bgmibrutal4347
    @bgmibrutal4347 7 лет назад

    awesome explanation .... you save my 8 marks ... thank you very much sir !!! 😊

  • @IndianVideoGameCollector
    @IndianVideoGameCollector 7 лет назад

    thank you for the crisp explanation, better and 10x note helpful than reading a boring book

  • @TheSagarSai
    @TheSagarSai 8 лет назад

    Fantastic video..!! Thanks as always.. learning a lot. :-)

  • @sushmitagoswami7320
    @sushmitagoswami7320 2 года назад

    Excellent explanation.

  • @DevastaingDj
    @DevastaingDj 4 года назад

    Tushar Roy = Sankat Mochan

  • @omkarkulkarni9202
    @omkarkulkarni9202 5 лет назад

    @Tushar please send us or post your patron link.. i cant keep learning without returning the immense value these lectures are providing! :) Thanks a lot and god bless you

  • @Bakepichai
    @Bakepichai 7 лет назад

    Very good explanation Tushar