Yes, you are right, after checking all patterns that starts with "a", and all patterns that starts with "b", PrefixSpan will check all patterns that starts with "c" by doing the same process. In this case, PrefixSpan will find no frequent patterns that starts with "c". This is why the step is ommitted in the video and also to keep the video short. But I should have made this more clear. Thanks for the feedback!
Here is the updated Powerpoint with the step for C that is added: www.philippe-fournier-viger.com/COURSES/Pattern_mining/PrefixSpan_the_presentation.pdf
Philippe Fournier-Viger 1 second ago Yes, you are right, after checking all patterns that starts with "a", and all patterns that starts with "b", PrefixSpan will check all patterns that starts with "c" by doing the same process. In this case, PrefixSpan will find no frequent patterns that starts with "c". This is why the step is ommitted in the video and also to keep the video short. But I should have made this more clear. Thanks for the feedback!
Thanks for the great videos sir. If we already have the result, for example with support: 3. How we could retrieve which rows in our sequence database? thanks.
Hi, Thanks. If you want to find which sequences of the database contains a pattern like you could write a simple algorithm. The algorithm would read each sequence. For a sequence, the algorithm would first try to find the first itemset {a}. Then after it is found, it would continue reading the sequence to find the second itemset {c}, and if there are more itemsets, it would continue to find the third one, etc. Then if all the itemsets are found, it is a match. By repeating this for all sequences of the database, you can find all the sequences that contains the pattern. It is not complicated to do that. Just a bit of programming. If you use my software SPMF, there is an algorithm called OCCUR which exactly do that. You can give a list of patterns to OCCUR and it will find the sequenes that contains each pattern and output them. Also, another option is modify an algorithm like PrefixSpan so that it will output the list of sequences containing each pattern. This is also not very hard to do. In my software SPMF, you have that option for some algorithms. It is called "show sequence IDs".
@@philfv thanks for the reply and providing us the software 😊. i already try your spmf software in python wrapping github.com/lolei/spmf-py and set the option "show sequence IDs". Its work!. Just curios if we already have the pattern and do the query. will try the OCCUR. once again thanks for the sharing sir.
Can you make a video of the BIDE algorithm? I read the article of the authors of the algorithm, but as far as I remember some of the points were not clear.
That paper is very tricky. It took me a long time to understand it when I first read it. As I remember, the problem is that it explains only the simple case in the paper where each itemset has a single item. But when we try to implement it for the general case, it becomes very complex to make it work. The implementation in SPMF of BIDE has a bug that I did not fix because it is too complicated to fix it, and there are other algorithms that find closed sequential patterns. But for BIDE, I actually spend weeks to do the first implementation in SPMF. Then, I try to redo it completely again from scratch to make it better but still had trouble with the general cases. It is not an easy algorithm to implement correctly for the general case. I dont think that I will make a video about BIDE now. But I will keep your suggestions. I might do it later or at least make a video about closed sequential pattern mining that outline the main idea.
Good evening, Thanks for your suggestions. I have some slides about the SPADE algorithm that I use for teaching. But I need to polish them a bit more to record a video about SPADE. I will keep your suggestion and try to make a video when I have time. In the mean time, if you want slides for SPADE, you can send me an e-mail to philfv8@yahoo.com and I can share my current slides with you.
Hi, you could check the SPMF wrapper for Weka that was developed by someone else: github.com/christopher-beckham/spmf-wrapper It could help you call the sequential pattern mining algorithms from SPMF such as PrefixSpan from within Weka. I did not try it but it may work.
Should we add one more step for c or not? please clarify
Yes, you are right, after checking all patterns that starts with "a", and all patterns that starts with "b", PrefixSpan will check all patterns that starts with "c" by doing the same process. In this case, PrefixSpan will find no frequent patterns that starts with "c". This is why the step is ommitted in the video and also to keep the video short. But I should have made this more clear. Thanks for the feedback!
Here is the updated Powerpoint with the step for C that is added:
www.philippe-fournier-viger.com/COURSES/Pattern_mining/PrefixSpan_the_presentation.pdf
Thanks a lot for responding
Very nice video reference; more thorough and more engaged than my professor!
Thank you for the awesome explanation, greetings from Mexico
Errata: In step 7, in the projected database of , the support of is 4 instead of 3.
What a great video you provide Sir! You really help me understand the concept and it helps me a lot on my final. Thank you so much
Best explanation ever, thank you!!!!
Thanks! Happy you like it!
In step 7 in projected database of : Support of is 4 instead of 3.
pl check and correct me if i am wrong
You are right. It should be 4. Thanks for reporting the error!
We must do the same for c right from the sequence DB? But then there will be no frequent patterns with c right?
Philippe Fournier-Viger
1 second ago
Yes, you are right, after checking all patterns that starts with "a", and all patterns that starts with "b", PrefixSpan will check all patterns that starts with "c" by doing the same process. In this case, PrefixSpan will find no frequent patterns that starts with "c". This is why the step is ommitted in the video and also to keep the video short. But I should have made this more clear. Thanks for the feedback!
Great tutorial. Thank you so much for your explanation.
Thanks for watching! Appreciate your feedback!
Hope you enjoy this video!
Really great explanation sir
Thanks for watching. Happy you like it.
Extremely helpful Sir.
Glad to hear that
Thanks for the great videos sir. If we already have the result, for example with support: 3. How we could retrieve which rows in our sequence database? thanks.
Hi, Thanks. If you want to find which sequences of the database contains a pattern like you could write a simple algorithm. The algorithm would read each sequence. For a sequence, the algorithm would first try to find the first itemset {a}. Then after it is found, it would continue reading the sequence to find the second itemset {c}, and if there are more itemsets, it would continue to find the third one, etc. Then if all the itemsets are found, it is a match. By repeating this for all sequences of the database, you can find all the sequences that contains the pattern.
It is not complicated to do that. Just a bit of programming. If you use my software SPMF, there is an algorithm called OCCUR which exactly do that. You can give a list of patterns to OCCUR and it will find the sequenes that contains each pattern and output them.
Also, another option is modify an algorithm like PrefixSpan so that it will output the list of sequences containing each pattern. This is also not very hard to do. In my software SPMF, you have that option for some algorithms. It is called "show sequence IDs".
Thanks for watching ;-)
@@philfv thanks for the reply and providing us the software 😊.
i already try your spmf software in python wrapping github.com/lolei/spmf-py and set the option "show sequence IDs". Its work!. Just curios if we already have the pattern and do the query. will try the OCCUR. once again thanks for the sharing sir.
Can you make a video of the BIDE algorithm? I read the article of the authors of the algorithm, but as far as I remember some of the points were not clear.
That paper is very tricky. It took me a long time to understand it when I first read it. As I remember, the problem is that it explains only the simple case in the paper where each itemset has a single item. But when we try to implement it for the general case, it becomes very complex to make it work.
The implementation in SPMF of BIDE has a bug that I did not fix because it is too complicated to fix it, and there are other algorithms that find closed sequential patterns.
But for BIDE, I actually spend weeks to do the first implementation in SPMF. Then, I try to redo it completely again from scratch to make it better but still had trouble with the general cases. It is not an easy algorithm to implement correctly for the general case.
I dont think that I will make a video about BIDE now. But I will keep your suggestions. I might do it later or at least make a video about closed sequential pattern mining that outline the main idea.
thanks a lot that was really helpful
Can u share an example on spade algorithm
Good evening, Thanks for your suggestions. I have some slides about the SPADE algorithm that I use for teaching. But I need to polish them a bit more to record a video about SPADE. I will keep your suggestion and try to make a video when I have time. In the mean time, if you want slides for SPADE, you can send me an e-mail to philfv8@yahoo.com and I can share my current slides with you.
Great tutorial sir.thank u 🙏🏻
Have you ever add this algorithm to weka, I want do that but have many difficult. Do you suggest some way to solve that problem. Thank you so much
Hi, you could check the SPMF wrapper for Weka that was developed by someone else:
github.com/christopher-beckham/spmf-wrapper
It could help you call the sequential pattern mining algorithms from SPMF such as PrefixSpan from within Weka. I did not try it but it may work.
Great Video!!
Good shit
thanks
Thanks for watching!
worst
what a weird thing to say on such a video
I do these videos to share knowledge freely with students all over the world. Any constructive suggestions for improvements are welcome.