Hi really interesting and helpfull video. I would like to ask you if replacing the or_postings and and_postings functions with set union and intersection will decrease the performance.
Hi Iraklis, I suspect the native set operations will be significantly faster than the example provided here using lists. After a quick test on two very small posting lists (10 elements), running the or_postings function took 4.82 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each), while native or operator on sets took 1.17 µs ± 6.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) after converting the lists to set types, which implement a hash table. The CPython source for set operations is in Python's Objects/setobject.c for additional reference
Looking forward to more of this series!!
Same :)
Hello Wes, I rarely log into youtube ; you are an exception... wanted to say enjoyed your videos a lot ...a big THANK YOU
Helped me and my project. Thank you very much for putting this together.
Hello Wes, Good series so far. For those who want a simplified approach, use sets and the functions of sets like Union and Interserction
Boolean operations on sets are awesome
very useful video, I hope these series can continue some day ...
I am watching this series. and thank you for this videos. I am taking information retrieval course and this series helps me a lot. Keep going!!!
Hi Perhat Meredow, thanks for the message! I will certainly keep going. Thanks for tuning in. Good luck!
Is it only 3 classes? Man, they are excellent courses !! Please let me know if there is a part 4 and so on...
Very nicely explained the inverted index model.
Thanks a lot. Really good tutorial.
When will part 4 be released?
Thank a lot! Hope you create serries more.
how to take input if i have 1000 documents inside a folder ?
Can you also share document? It would be really Great. Thank you!
Hi really interesting and helpfull video. I would like to ask you if replacing the or_postings and and_postings functions with set union and intersection will decrease the performance.
Hi Iraklis, I suspect the native set operations will be significantly faster than the example provided here using lists. After a quick test on two very small posting lists (10 elements), running the or_postings function took 4.82 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each), while native or operator on sets took 1.17 µs ± 6.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) after converting the lists to set types, which implement a hash table. The CPython source for set operations is in Python's Objects/setobject.c for additional reference
this is really helpful. Can you please upload similar videos for Vector space model, BM 25 and language model? Looking forward to them. Thanks
Thanks Megha. I have notes on Okapi BM-25 and vector approaches, so I’ll try to get a recording done soon. Appreciate the suggestion! All the best
yeah I totally agree, would be great to have vids on those topics
Can you show the code in C# as I need it for a project delivery tomorrow for my final exam in college
please ?
Thanks sir, but I want to showing the document in Run (resulting in Or and And Function)
I have taken info nd data retrieval class this Fall. And look like you only gonna pass me 😛 in subject
But looks like it’s only in 3 classes 😕
thnks a lot
Wes...Thank you. Too many ads.
Looking forward to more of this series !!