Knuth Morris Pratt (KMP) String Search Algorithm - tutorial with failure function in Java

Поделиться
HTML-код
  • Опубликовано: 14 янв 2025

Комментарии • 118

  • @maridavies3425
    @maridavies3425 4 года назад +55

    This is the best explanation of how the prefix array is constructed. Awesome job and thanks!

  • @mandeepsinghrr
    @mandeepsinghrr 4 года назад +10

    This channel honestly has the best explanation for all the difficult problems. The nlogn LIS implementation was too complex for me to understand, but then I stumbled upon this channel. The same goes for KMP! Amazing work! Really appreciate how you explain it in a simple manner!

    • @stablesort
      @stablesort  4 года назад +1

      Thank you for such a wonderful compliment! I am glad to hear that the videos are helpful!

  • @sloppyguy4940
    @sloppyguy4940 4 года назад +20

    Even a sloppy guy like me would understand this explanation, well done!
    I also like how you comunicate :)

    • @stablesort
      @stablesort  4 года назад +5

      Thanks for the compliment! Cheers, Sloppy Guy!

  • @junkaze08
    @junkaze08 3 года назад +3

    This is far the best explanation and discussion of the KMP, thank you for saving my semester.

    • @stablesort
      @stablesort  3 года назад

      Great! I am glad to hear that it was helpful

  • @xiaoxiaoxiao2043
    @xiaoxiaoxiao2043 2 года назад +1

    The best explanation of the MIT version of KMP. Good job!!!
    I learned the KMP algorithm through the 《Algorithm》which is public by Princeton University. The MIT version of KMP and Princeton's version is the two main version of the KMP algorithm.

  • @ram-oj2ij
    @ram-oj2ij 4 года назад +2

    I watched other videos and it finally clicked with this video, thanks. All your other videos are produced very well and great.

  • @prasannakumarnaidu5030
    @prasannakumarnaidu5030 4 года назад +7

    Thank you sir
    After lot of confusion in understanding this algorithm you made it easy to understand . Hope you make many videos like this.

    • @stablesort
      @stablesort  4 года назад +2

      I am glad it was helpful! But this is just a hobby for me. I don't make any money from it :) Cheers!

  • @Japo0po0
    @Japo0po0 8 месяцев назад

    Your videos are the best for learning these algorithms. Much appreciated.

  • @irfanquader3944
    @irfanquader3944 4 года назад +2

    your explanation method (with the visualization ) and calm way of explaining really helped me understand the algorithm :)

    • @stablesort
      @stablesort  4 года назад

      Glad to hear it and thanks for the compliment!

  • @AluSoft
    @AluSoft 2 года назад

    Hi Stable sort!
    Please keep posting !!! You have the best videos I've ever seen!!!

  • @abdalla4425
    @abdalla4425 9 месяцев назад +1

    Wow this is the a way better explanation than the GFG article. Thanks

  • @juniperandspice
    @juniperandspice 3 года назад

    i am so glad i stumbled onto this channel

  • @danielsasse7609
    @danielsasse7609 3 года назад +2

    This was incredibly helpful for understanding the concepts behind the algorithm, Thank you!

    • @stablesort
      @stablesort  3 года назад

      Wonderful! And thanks for the leaving a few good words

  • @vinayak186f3
    @vinayak186f3 3 года назад

    Just 1:44 into the video and got the ans I was long been pondering over , Thank you sir

  • @abelsimon5308
    @abelsimon5308 4 года назад +3

    thanks a lot for going through these classics, also your code looks simple and concise!

    • @stablesort
      @stablesort  4 года назад

      Glad you like them! I try to bring something new to the table or explain it in a way that is "less confusing" than typical explanations.

  • @lukelyu3264
    @lukelyu3264 4 года назад +1

    The most efficient one out of those confusing videos

    • @stablesort
      @stablesort  4 года назад

      Just trying to make things a little less confusing =)

  • @AleemKhan42069
    @AleemKhan42069 2 года назад

    You sir have saved my career thanks

  • @Prashantkumar-pn6qq
    @Prashantkumar-pn6qq 4 года назад +1

    Very good and illustrative explanation!

  • @mohamedmusharaf406
    @mohamedmusharaf406 4 года назад +2

    Wow Sir. The explanation is really good!! Glad I've found your channel!!

    • @stablesort
      @stablesort  4 года назад

      Glad to hear that and thanks for the compliment!

  • @amitkumarprajapati5094
    @amitkumarprajapati5094 3 года назад +1

    Really appreciated your way of explanation sir

    • @stablesort
      @stablesort  3 года назад

      Thanks! Glad to hear that!

  • @Krishnayadav-ln8ru
    @Krishnayadav-ln8ru 3 года назад +2

    perfect explanation

  • @maulindusarkar4581
    @maulindusarkar4581 4 года назад +1

    It's awesome man ! Thanks for this beautiful explanation !

  • @ryancodrai487
    @ryancodrai487 3 года назад +1

    Fantastic video

  • @harshith_takkala
    @harshith_takkala 2 года назад +1

    thanks man!

  • @troglodyto
    @troglodyto 4 года назад +1

    very clear. thank you for your time and energy!

  • @bigsnorts4320
    @bigsnorts4320 2 года назад

    fantastic presentation. Thank you so much for this.

  • @frankwu1043
    @frankwu1043 4 года назад +1

    Excellent work! Please keep it up.

    • @stablesort
      @stablesort  4 года назад

      Thanks for the compliment! Let me know if there are other topics that you'd like to see covered.

    • @frankwu1043
      @frankwu1043 4 года назад

      @@stablesort Hi Sir, could you cover applications of finding shortest path? I learned Dijkstra's Algorithm, Prim, but I don't know how to apply them into a weighted graph with Java. I'm kind of missed up with content and Java code. Thanks!

  • @create_space812
    @create_space812 4 года назад +1

    Best explanation!

  • @nickchen8259
    @nickchen8259 2 года назад +1

    Awesome job

  • @pavanvenkat9182
    @pavanvenkat9182 4 года назад +2

    Amazing explanation 👌👌

    • @stablesort
      @stablesort  4 года назад

      Thanks for the compliment!

  • @mfrice
    @mfrice 2 года назад +12

    Hi, I think there is another error at 5:23 in the animation. The shift should be (8-2)=6 instead of (9-3)=6. Although they happen to have the same results, the last 'c' in the pattern is a mismatch so it should not be counted in the matched length. Consider a modified example, the string is "abdxxxabdxxxabcde" and the pattern is "abdxxxabcde". The fist mismatch is at the 'c' in the pattern, and the lps of 'c' is 0. The shift should be the matched length 8 minus the lps of the second 'b' which is 2, which results in 6, rather than (9 - 0) = 9. Do you feel it make sense?

    • @Kuppu-pk
      @Kuppu-pk 2 года назад +3

      it perfectly makes sense to me. Similarly, i see at 6:03 we see that the video shows as 3-0 but he does shift only by 2 which doesn't make sense. As per your clarification, the prefix length is 2 and value of last matched b is 0, so, 2-0 = 2 which is the two shift that the presenter makes.

    • @vmani369
      @vmani369 2 года назад

      Yes it seems so here

    • @jamesbridger9878
      @jamesbridger9878 2 года назад

      Thanks for the clarification. I actually got myself confused when looking through my teachers slides and trying to apply the video's formula.

  • @zaidziad1456
    @zaidziad1456 3 года назад +1

    Awesome!

  • @soroushhh1586
    @soroushhh1586 3 года назад +1

    Thanks man 🤘🙏

  • @AlfredNrejaj-pn9jd
    @AlfredNrejaj-pn9jd 5 дней назад

    Very appreciated

  • @manojkvn2632
    @manojkvn2632 4 года назад +1

    great explanation.

    • @stablesort
      @stablesort  4 года назад

      Thanks for the compliment!

  • @tsvetomirgalabov110
    @tsvetomirgalabov110 4 года назад +1

    awesome JOB thank you

  • @karanbhatia6712
    @karanbhatia6712 3 года назад +1

    Great Video! Thanks :)

  • @misoren9645
    @misoren9645 3 года назад +3

    Thank you for the video, nice explanation.
    Can you please explain "else if(prefixLen > 0) prefixLen = ar[prefixLen]" ?
    I don't see where in the simulation something like that was used, I saw either reset to 0 or match current characters with the prefix. prefixLen++; ar[i] = prefixLen.

    • @stablesort
      @stablesort  3 года назад +2

      Good question. So that happens when we have a mismatch and the length of the prefix is something other than zero. Consider the output of calcPrefixLen("xxyxxxxx"):
      [-1, 0, 1, 0, 1, 2, 2, 2, 2]
      Note how the last three 'x' characters don't increase the length of the prefix, but they keep holding the length of the prefix found so far. I hope this helps!

  • @jacobkurien
    @jacobkurien 3 года назад +1

    awesome as always. can you do a video on catalan numbers

    • @stablesort
      @stablesort  3 года назад

      That's a great suggestion - thanks! Catalan numbers keep popping up in leetcode questions as well, so yeah, definitely worth covering!

  • @godofkings4366
    @godofkings4366 Год назад

    thank you very much

  • @FeuersurferLP
    @FeuersurferLP 3 года назад +1

    Hi! First of all: Great video! Really like your style of explaining.
    But there's one thing I don't quite get.. At 5:28 the prefix that already matched is "ab". Since "c" didn't match the text at pos 11, we look if the character at pattern-position 3 (coincidentally also c) matches the character at text-position 11, right?
    So.. let's say "c" matched but the text at pos 12 didn't match "y" (so text i.E. ---abcxxxabc----). According to the table and your explanation we move back all the way to the beginning. But shouldnt we check if the text at position 12 matches the pattern at position 4 (x), since the prefix that matched is "abc"?
    Actually, I debugged your code and it does exactly what I described above. I chose the input string "---abcxxxabc---" and after failing to match "y" the variable p changed to 3, not to 0. This happens because in the code you only increment p when you got a match, so at the point of comparing "y" with the text the prefix length is still 3. In your animation on the other hand you increment p before looking wheter or not you have a match, which would lead to a prefix length of 0.
    So actually, the prefix when checking wheter text at pos 11 is "c" is not 3, as you showed in your video (5:19), but instead it is 2. (also in your code, since there you didn't move yet and the matched prefix is "ab"!). This also checks out with your formula for the shift. You matched 8 not 9 so far, so "shift by" is = 8 - 2 = 6!
    I would love to hear your opinion on my points. Your code seems to work like i described it, but maby you weren't explaining 1:1 how the code works and i missed a point there.. I am by far an expert on the topic 😅
    Anyways, greetings from germany, keep up the great work!

    • @stablesort
      @stablesort  3 года назад

      Hi Josh, I had a similar conversation with another viewer, @Achyut Sapkota, some time ago. We concluded that the code works, the formula works, the explanation also works but there is a error in the animation at 6:04. The animation at that moment shows that the shift is made by only two characters, even though I actually say out loud that the shift should be done by 3. Sorry about that - I didn't catch that before posting the video. But I did document it in the description of the video after the fact. Please take a look there as I go into detail explaining the actual implementation. Hope this helps!

    • @FeuersurferLP
      @FeuersurferLP 3 года назад +1

      ​@@stablesort Hi, I saw the conversation you were having with @Achyut Sapkota and that isn't the point I am having an Issue with. My Issue is that according to your explanation, when matching the text "---abcxxxabc----" with "y" of the pattern (and failing) your table would tell you to shift by the whole 10, which is incorrect and not what your code does. Your code checks if after the previously matched prefix "abc" the current letter of the input is a "x", which is the correct behaviour (because "abcx" is a valid prefix). Hope this clarifies what my problem is, for a more detailed explanation please see my comment above.

    • @FeuersurferLP
      @FeuersurferLP 3 года назад +1

      @@stablesort Please note that I really do enjoy your videos and this isn't any kind of personal attack =)
      But in my opinion your animation is incorrect (and i'm not referring to the 2/3 shift) and doesn't work like your code which really gave me a hard time. So I really want to sort this out, so future viewers won't be having an issue with that part. I think I really made clear where my issue lies and i hope you give it a fair shot of seeing where I'm coming from!

    • @stablesort
      @stablesort  3 года назад

      Hi Josh. No offense taken at all. If anything, I appreciate that you spent time and effort on sorting this out =)
      I think what I was trying to do was to first explain the algorithm in a way that makes the most intuitive sense - shifting the pattern string visually. For that, I had to have a formula that calculates the shift amount. The code, of course, doesn't shift anything - it just adjusts the index to the pattern string, p. As you pointed out, in the animation p looks to be incremented too soon (your example with of what would happen with ---abcxxxabc---- text is spot on). That's the side effect of "shifting" the pattern string. I think if I were to redo the video, I'd explain w/o focusing so much on the shifting.

  • @thirum5940
    @thirum5940 4 года назад +1

    @Andre Violentyev - Can you please elaborate on "How after a mismatch each character is examined atmost twice and how the time complexity is linear to the length of the text"?

    • @stablesort
      @stablesort  4 года назад +2

      In the typical bruit force algorithm, whenever a mismatch occurs between the text and the pattern string, the pattern string is shirted by just 1 position to the right. This leads to O(n * m) running time, n and m being the lengths of the text and pattern string. But in KMP algorithm, the pattern string is shifted by more than one position. In fact, it is shifted so much, that each character in the text is examined at most twice. Meaning, some text character text[i] is checked to see if it matches with some pattern character pattern[j]. If no match, the pattern is shifted, then text[i] is examined one more time to see if it matches one other possible pattern character pattern[k]. If there is no match again, then the pattern string is shifted past where text[i] is. So at most there were two examinations done per text character. Therefore, the time complexity ends up being O(2n + m), which reduces to O(n + m), linear time complexity.
      I hope this helps but let me know if you have other questions.

    • @thirum5940
      @thirum5940 4 года назад

      @@stablesort Even after spending so much time, I was struggling to understand this part. Now it is clear to me. I can't thank you enough. I hope you keep making this kind of videos and hopefully one day on aho-corasick algorithm :)

    • @stablesort
      @stablesort  4 года назад +1

      @@thirum5940 Glad to hear it. I am adding Aho-Corasick algorithm to my to-do list - thanks for the suggestion 🙂

  • @TheJakobPolegek
    @TheJakobPolegek 3 года назад

    Thanks blood.

  • @fenil197
    @fenil197 4 года назад +1

    I liked and understood very well .I currently researching this KMP Algo .Topic , can please guide me. what is next possible solution this algo.?

    • @stablesort
      @stablesort  4 года назад

      Hi Fenil, I am not sure if I understand your question. What do you mean by "next possible solution"? In case you are looking for an example implementation, here is a link to the source code, written in java: bitbucket.org/StableSort/play/src/master/src/com/stablesort/stringmatch/KnuthMorrisPratt.java

  • @utkarshnaik1415
    @utkarshnaik1415 4 года назад +1

    If the mismatch is at 'y' character then the shift will be by size(pattern string) as we have 0 in the pattern length array , this seems wrong and the shift should be by size(pattern string - 3)

    • @stablesort
      @stablesort  4 года назад +1

      Thanks for your question. The character 'y' in our example pattern string is found only at the right end of the pattern string. Meaning, there are no other 'y' characters on the left side. If we matched the previous 9 characters, we know for sure that none of those 9 characters are 'y'. Therefore, no need to subtract 3. You could just fast forward by 9. I hope this helps but let me know if that's not clear.

    • @utkarshnaik1415
      @utkarshnaik1415 4 года назад +1

      @@stablesort how would it work out for following search string for the pattern explained in the video : abcxxxabcxxxabcy
      I believe using pattern length as 0 for y character would cause us to miss the match .

    • @stablesort
      @stablesort  4 года назад

      @@utkarshnaik1415 As they say "code resolves arguments" - I tried it just now, passing that as input (the code is linked in the description but here it is again: bitbucket.org/StableSort/play/src/master/src/com/stablesort/stringmatch/KnuthMorrisPratt.java ) and the algorithm did find the match at index 6. Try it yourself if you don't believe me =)

    • @FeuersurferLP
      @FeuersurferLP 3 года назад

      @@stablesort But we aren't looking for another "y". The matched prefix when checking if the text is "y" would be "abc", so we have to check if the symbol that isn't "y" is "x", since then we would have "abcx", which is the beginning of our pattern string and therefore shouldnt be skipped.

    • @stablesort
      @stablesort  3 года назад

      @@FeuersurferLP Thanks for your comment but I am not sure if I fully understand your question. Could you possibly rephrase it on its own? Oh wait, you may have done just that in another comment. I am going to read over that first =)

  • @graju2000
    @graju2000 4 года назад +2

    Please don't stop your service :)

    • @stablesort
      @stablesort  4 года назад +1

      No worries! I am working on the next episode as we speak =)

  • @narayangupta2623
    @narayangupta2623 3 года назад

    At 6:05 your formula tells to shift by 3 but you shifted by 2(Which is correct). Please clearify the formula.

    • @stablesort
      @stablesort  3 года назад +1

      Thanks for posting your question. Some time ago, another viewer, Achyut Sapkota, pointed out this same exact mistake. Let me copy-n-paste my response to:
      ... You are right, my animation at 6:04 should have moved not by 2 but by try 3 characters. My mistake! By the way, at that point we'd also increment the pointer in the text to the next character. I double checked the code ( bitbucket.org/StableSort/play/src/master/src/com/stablesort/stringmatch/KnuthMorrisPratt.java ) and there it looks to be correct - once we get to index zero of the prefixLen array, which stores -1, we increment both the text and the pattern string pointers.
      p = prefixLen[p];
      if (p < 0) {
      t++;
      p++;
      }
      In the above example once we shift by 3, we essentially go "out of bounds" for the pattern string. The illustrations show that the first character, 'a', is at index 1 of the prefixLen array (not index 0, which is probably what is causing confusion since most programming languages use 0 based indexes for strings). So the formula works, it's just that, once we detect the "out of bounds" condition, we reset the pattern string pointer to the first character, and increment the text pointer by 1. In other words, we shift the pattern string so that its first character aligns with the next character in the text. This is what insures that each character in the text is examined at most twice.
      I hope this clarified things a little as opposed to making it even more confusing =) But either way, please let me know.
      Thanks!

  • @decentralmultiverse6719
    @decentralmultiverse6719 4 года назад +1

    Good explanation, Could you make a video on lazy propagation in segment tree, Thanks

    • @stablesort
      @stablesort  4 года назад

      Thanks for the suggestion. I'll look into lazy propagation. It'd be a good follow up to my earlier video on segment trees (in case you have not seen it: ruclips.net/video/xztU7lmDLv8/видео.html )

  • @ggorkk
    @ggorkk Год назад

    Hi sir, in the example when the second c is mismatched the first time, we already know that the pattern "abc" is not possible to be in the text at index 9 of the text. Shouldn't this information be used to shift the p variable all the way instead of only 9-3? Does KMP algorithm consider this information as well?

  • @20aatif
    @20aatif 4 года назад +1

    Please come up with lazy propogation and Fenwick tree

    • @stablesort
      @stablesort  4 года назад

      Well, I do have a Fenwick Tree tutorial: ruclips.net/video/uSFzHCZ4E-8/видео.html
      But to keep its length to a under 10 minutes, I did not cover lazy propagation. Thanks for the suggestion, I should probably make a followup video and tackle lazy propagation. Cheers!

  • @RaceMusgraveASU
    @RaceMusgraveASU Год назад +2

    Hi, unfortunately this doesn't work for all cases. However if we modify the formula used for shifting very slightly, it will work. shift = (len-1) - prefixLen[len - 1]
    I know many people wont believe me, but I agonized over this for hours.
    lets look at an example
    text: ababcabcabababd
    pattern: ababd
    lps for our new pattern would be: 0 0 1 2 0
    now lets play out formula as laid by the video
    ababcabcabababd
    ababd
    x --> mismatch at 5, 5 - 0 = shift 5
    ababcabcabababd
    ababd
    x --> mismatch at 1, 1 - 0 = shift 1
    ababcabcabababd
    ababd
    x --> mismatch at 3, 3 - 1 = shift 2
    ababcabcabababd
    ababd
    x --> mismatch at 1, 1 - 0 = shift 1
    ababcabcabababd
    ababd
    x --> mismatch at 5, 5 - 0 = shift 5
    ababcabcabababd
    ababd
    as we can see, the matching alignment was entirely overlooked when using the formula as given by the video. but if we modify it as described before, not only do we find the match for the example I just gave, we can find it for any other example as well.
    to modify the formula, either modify it according to what I explained previously, or use the formula he gave but use the last matching letter instead of the first mismatched letter

    • @bulazs01
      @bulazs01 Год назад

      My guy, im learning this rn in algorithm class so i checked out this video. i looked at it and was thinking what would have happened if the y was the mismatch, and started doubting my knowledge, so thanks for reassuring me and giving me an actual working formula. Thanks mate :D

  • @anonymoussloth6687
    @anonymoussloth6687 3 года назад

    I don't understand one thing. suppose we have a pattern like this:
    txt: a a a b a (and so on...)
    pat: a a a b b
    lps: 0 1 2 0 0
    In this case, where there is a mismatch at the last a and b, I understand why pat is started at 0. because there is no prefix that is also a suffix at that point in the pattern.
    But why is the txt pointer not shifted back?? How can we be sure that if the pattern is just shifted one place to the right (like in the naive/brute force method) that we won't find a match?
    meaning:
    a a a b a ...
    a a a b b
    I know in this case it is not a match, but how can we be sure that this is the case always?
    My question basically is, in this algo, we have a i pointer that is iterating through the txt and a j pointer that is iterating through the pat. when there is a mismatch, the j pattern is shifted back a certain amount (which i understand why) but the i pointer is not shifted at all. THIS is what I don't understand. j pointer is shifted based on if there is a prefix that is also a suffix and that makes sense. But i dont understand why the i pointer is not shifted REGARDLESS of whether a prefix is found or not.

    • @stablesort
      @stablesort  3 года назад

      Hi Ayush, thanks for posting your question. In your example, there are 4 characters that matched and then there is a mismatch on the 5th character. Now, we know something about the first 4 characters - the fact that they matched contains extra information that Knuth-Morris-Pratt make use of. The fact that the first 'b' character matched, implies that there is no other 'b' characters in the first 3 character sequence. Therefore, we don't need to shift back the text pointer (the i pointer in your example). This is a little tricky but I hope the animation helps visualizing it.

    • @anonymoussloth6687
      @anonymoussloth6687 3 года назад

      @@stablesort I am not sure what you mean by the first b matching implies there is no other b in the first 3. How can we know that? The algorithm doesn't know if it's the first b or any amount of b.
      If the txt was: a b c b... And the pattern was a b c x then when there is a mismatch between x and b, we don't know if there was a b before. All we know is if there is a prefix that is also a suffix in the pattern before x. And in this case, there isn't because ab is not the same as cx. Basically, I don't understand why the KMP algo doesn't move the I pointer back. If there is a prefix that is also a suffix, that only means that the j pointer should be moved back by that amount. But why is the I pointer not going to the next character in the text like in the naive algo?

    • @stablesort
      @stablesort  3 года назад

      @@anonymoussloth6687 You are asking the right questions. Let's look at your first example:
      txt: a a a b a (and so on...)
      pat: a a a b b
      lps: 0 1 2 0 0
      I'll try to elaborate a little - the 1st 'b' matched but we have a mismatch on the 2nd 'b'. The pattern string is then slid to the right by 4 and its pointer is reset to 0. This is because the pattern string does not start with 'b' (lps[] for that character is 0). We did match on first 4 characters. The sequence of those characters matters. The brute force algorithm is blind to this idea.
      Let's have a thought experiment: suppose we have a very long text, 1M totally random characters. Then we also have a long pattern string, suppose 1K long. It just so happens that the first 999 characters matched but there is a mismatch on the 1000th character. Now, do you agree that there is no point of sliding the pattern string over to the right by just 1 character (as is done by the brute force algorithm)? There is just no chance that by sliding the pattern by 1 character we would happen to have a full match. Intuitively I hope you can see that we'd probably have to slide the pattern string over by some number close to 999. May be it'd be 995, if the last 4 characters of the pattern string happen to be also the first 4 - the whole prefix idea =). What do you think? Do you buy this argument?

    • @anonymoussloth6687
      @anonymoussloth6687 3 года назад +1

      @@stablesort I see. Now it's clear. Thank you! And the video was awesome!

    • @stablesort
      @stablesort  3 года назад

      @@anonymoussloth6687 Cheers!

  • @asitisj
    @asitisj 4 года назад

    Noob question : why did you prefer to write a subroutine to function, to create an array?

    • @stablesort
      @stablesort  4 года назад +1

      Just to keep the size of each function in check - it's easier to read, IMHO.

  • @MatkoFaka
    @MatkoFaka 3 года назад

    Hi, question for everybody.
    Some examples/code of KMP on the internet use DFA which is basically a table/2Darray. Are those examples wrong or is it just another variant of KMP?

    • @algorithmtheory
      @algorithmtheory 2 года назад

      The DFA variation is completely analogous to the construction of the prefix array. You encode the DFA by looking at all of the possible prefixes of your pattern and run the string on the automaton. The encoding of the DFA is exactly the same as the prefix table. Or maybe not exactly the same, but an equivalent variant.

  • @achyutsapkota1522
    @achyutsapkota1522 4 года назад

    I am afraid that, in explanation of algorithm, Shifting by 9-3=6 and 3-0=3 doesn' have consistency. In later case, it seems only two characters are shifted. I apologize in advance, if I have misunderstood

    • @stablesort
      @stablesort  4 года назад

      Thanks for bringing up this question. You are right, my animation at 6:04 should have moved not by 2 but by try 3 characters. My mistake! By the way, at that point we'd also increment the pointer in the text to the next character. I double checked the code ( bitbucket.org/StableSort/play/src/master/src/com/stablesort/stringmatch/KnuthMorrisPratt.java ) and there it looks to be correct - once we get to index zero of the prefixLen array, which stores -1, we increment both the text and the pattern string pointers.
      p = prefixLen[p];
      if (p < 0) {
      t++;
      p++;
      }
      Thanks for keeping me honest!

    • @achyutsapkota1522
      @achyutsapkota1522 4 года назад

      @@stablesort Thank you very much for your kind response.I was trying to find a suitable way to explain the algorithm and I thought the shifting formula you mentioned was an easier way to explain KMP algorithm. From an explanation perspective of the algorithm, regardless of the program, I still couldn't understand how shifting by 3 characters in the mentioned section makes the correct representation of the algorithm. I think the formula of shifting ie. "shift by=len-prefixlen[len] " gives incorrect shifting length in the case of "0" in prefixlen[len] .

    • @stablesort
      @stablesort  4 года назад +1

      @@achyutsapkota1522 In the above example once we shift by 3, we essentially go "out of bounds" for the pattern string. The illustrations show that the first character, 'a', is at index 1 of the prefixLen array (not index 0, which is probably what is causing confusion since most programming languages use 0 based indexes for strings). So the formula works, it's just that, once we detect the "out of bounds" condition, we reset the pattern string pointer to the first character, and increment the text pointer by 1. In other words, we shift the pattern string so that its first character aligns with the next character in the text. This is what insures that each character in the text is examined at most twice.
      I hope this clarified things a little as opposed to making it even more confusing =) But either way, please let me know.
      Thanks!

    • @achyutsapkota1522
      @achyutsapkota1522 4 года назад +1

      @@stablesort Thank you very much for your kind explanation. I got it perfectly now. I actually had misunderstood the formula. I had interpreted "shifting by 6" in a different way. Coincidently, that interpretation, though wrong, produced the same shifting scenario. Apologies for taking your time.

    • @stablesort
      @stablesort  4 года назад +1

      @@achyutsapkota1522 thanks again for posting your question. Hopefully this discussion will help someone else down the road.