Text tokenization is one of the most overlooked topics in LLMs, although it plays a key role in how they work. Take a look at the following video to see how the most popular tokenization methods work: ruclips.net/video/hL4ZnAWSyuU/видео.html
When initially building the small world we need to iteratively look for k nearest neighbors while inserting the new documents. How do we find those neighbors
Good question! Of course there's always a chance of getting stuck in a local optima because you are basically using a greedy algorithm here, and that's why you usually perform the search algorithm multiple times, so you reduce the chance of that happening.
The nodes between levels represent the same vectors. Basically on the top level you have you have a sparse graph of vectors and on the lowest level you have the entire graph. Similar to a skip linked list, you move to another node in the same level if that's closer to the query, or move down if no such nodes exists. This allows you to travel higher distances since you start at a higher level. Hope this makes sense and please let me know if you need further clarification! :)
First of all, thanks for becoming a member of this channel!❤️ I thought of making an introductory course for either NLP or ML (although the later is a bit saturated). Stay tuned for updates!
Text tokenization is one of the most overlooked topics in LLMs, although it plays a key role in how they work. Take a look at the following video to see how the most popular tokenization methods work: ruclips.net/video/hL4ZnAWSyuU/видео.html
best explanation on youtube atm
Thanks! Glad you think so! :)
beeest
You have explained a complex topic in very simple terms. Keep up the good work.
Thanks! Will do! :)
great explanation, massively underrated video
Thanks! Glad you liked the explanation! :)
Very good explanation with a good use of animations!
Thanks! Glad you think so! :D
Perfect explanation, exactly what I was looking for!
Thanks! Glad you found it helpful! :)
Good Explanation Provided. Thank you vey much for this.
Thanks! Glad it was helpful! :)
In 6:37 there is an mistake because the "closest to query"-point was in Level 2 already there and not selected. Do you understand?
Exactly!
Very nicely explained! Thank you for making this video
Glad it was helpful! :)
When initially building the small world we need to iteratively look for k nearest neighbors while inserting the new documents. How do we find those neighbors
great explanation
Thanks! Happy to hear that you liked the explanation! :)
Hey, have a question, isnt there a risk of getting stuck in a local optima when comparing similarity between query node and db nodes in the graphs?
Good question! Of course there's always a chance of getting stuck in a local optima because you are basically using a greedy algorithm here, and that's why you usually perform the search algorithm multiple times, so you reduce the chance of that happening.
How did you go from a group of random vectors to a skip linked list structure?
The nodes between levels represent the same vectors. Basically on the top level you have you have a sparse graph of vectors and on the lowest level you have the entire graph. Similar to a skip linked list, you move to another node in the same level if that's closer to the query, or move down if no such nodes exists. This allows you to travel higher distances since you start at a higher level. Hope this makes sense and please let me know if you need further clarification! :)
If the K equals to the total number of documents, will this approach also be like brute force? Because it needs to go through each linked document.
If k equals to the number of documents, why not just simply return all documents? :)
could u please come up with nlp bascis course please. also basics of ML course please.
First of all, thanks for becoming a member of this channel!❤️ I thought of making an introductory course for either NLP or ML (although the later is a bit saturated). Stay tuned for updates!