Thank you very much. this is cool. btw, I'm a little bit distracted by the small background music since I'm wearing headset. It'd be great if you turn off background music.
Very nice and detailed explanation...thanks a lot! Just wanted to extend a bit, while precomputing you don't have to scan the entire subtree for all the nodes, just start doing recursively from bottom-up, for each node collects the top 3 suggestions form it's children's suggestions. An extended merge sort(instead of two arrays, it will run on many sets, one for each child) which will reak once you have collected top n. Whenever an update comes, you will have to update all the prefixes for the updated word, in insertion if the ranking of a word is higher than the lowest one, replace the lowest one, in deletion see if the word present in the current node is deleted, replace one with the next ranking recursively for the entire subtree again go from bottom-up.
I feel the important part is data collection and how it's going to be updated in the trie. Similarly how are we going to scale trie nodes -> sharding technique to split the data into multiple trie nodes Caching at a trie level, probably using heap to store top k results and how redis or distributed cache is going to be updated.
It would have been good to talk about how do we store the Trie, how do we scale it, how do we rank the words and update the Trie. The content is far from being complete. Keep going ! ;)
Here are my thoughts gathered after reading through some articles 1. Tries must be in-memory. A batch task should be there to backup inmemory trie to file store as json format or any serializable format, to recover incase of node failure. People also suggested of using graph db's like neo4j as a recovery backup. 2. Scaling up: I will talk about the truecaller usecase. Users phone number could start with +91 for India, +44 for UK. A BSNL SIM card starts with +91-82. May be these prefixes could be used to Load Balance the data set across multiple instances for better sharding. Scaling should be done based on memory usage of the active services. 3. In the case of Truecaller, they might be using Ternary Search Tree, spaning accross multiple nodes. qr.ae/pGrmaL Please do share your thoughts or points that can add on to this.
@@StockDC2 for someone who is starving, a loaf of bread would be great. For someone who is living a rich life, a loaf of bread is nothing. I hope you understand the analogy.
Time complexity of finding top K nodes is N LOG(K) and not K LOG(K). N because we got to through all possible nodes and they will be N at most (11:44). You could have added database sharding into the scope. Complete TRIE might not fit in one database server.
Very nicely explained. You have excellent command on your voice, each and every word is clearly spoken. You explain concepts very clearly. Diagrams are clear and you do not rush into concepts. Narendra great job.
Some important questions to think about : 1. Load balancing strategy 2. Caching strategy 3. How would you modify the frequency without compromising availability or increasing latency 4. Fault tolerance of the system - How would you store the already built trie, so that in case of failure system can be up soon 5. What would be the increment factors (per day increase) for the Trie? 6. Any other optimisation on client side ? What about connectivity and session start (means when will you call the suggestion API)
Very useful video.. Thanks for uploading this. BTW, in the precomputing step, we can find the result of all children of a node then get the results of patent from that. I think that will make it a little faster
Can we still store the frequency/priority of each word next to it, and at the parent have a reference to top 10 nodes (by priority). This saves us a lot of space that the hash table takes. Also, we can maintain dynamic updates to priority (based on frequency) by going going back from the child nod towards the parent, each time updating their respective List of pointers (Or priority Queue)
Thank you, very nice video. A quick question, the time complexity of selecting top k type aheads from n nodes should be nlogk, right? I assume we use min heap
You know how awesome you are. I love the quality of the material. I hope you never compromise on it. Please keep it coming and don't lose motivation. :) I am definitely a fan.
Hi Naren, Thanks for uploading your videos. Great content and it's very clear. I have got a small suggestion in this video, we don't have to sort once we identify the potential prefix candidates, lets's say we have found "n" candidates which are leaf nodes, then we could do max heap of k candidates out of ''n' candidates which would take O(klog(n)). Once again appreciate your effort. -Raja
your solution is pretty much building the suggestions based on the historical data. how would you handle the suggestion update in a more scalable way? The search key words can be more than 100 characters long, updating all the nodes in a single trie tree and its corresponding hashtable is not realistic, considering the traffic we are having is on the google or youtube level.
Thanks for the video. Question: how is Trie persisted? Can it be recreated from the prefix hash table? Another question is about the rank of nodes. how is that updated?
Nice work Narendra. The conceptual framework that you initially came up with and then went on to enhance it further was really impressive. Very easy to understand for anyone who wants to understand the mechanics of type-ahead. thank you for your valuable contribution.
Oh man, this video came 2 months after google asked me this in a interview. My answer was very similar, and I even explain how the table could be updated to reflect the changes on weights during a day. The interviewer sounded and looked so unimpressed, that to this day I wonder what did I do wrong. It is amazing I find this today and realized I was basically right. I hate how much of interviewing depends on your interviewer.
For the non-precomputing approach, I think the time complexity is incorrect on klgk part. You'll have to sort all nodes that are valid which is n, in your example it just happen to be n = k =3. If you sort all nodes at last like your approach, it would be nlgn; but we can use priority queue which gives nlgk. For the precomputation approach, updating the frequency or adding a new word seems pretty expensive?
Pooja Singh I don’t think trie is faster than hash table, searching in trie takes O(L) and hash is O(1), but it takes less space than hash. However, he stores top suggestions in every node, so there is no space saving
there is still a minor difference. Actually for trie structure, we save only one character, no need to save the whole word. I guess the author of this video wrote the whole word on each node for the purpose of demonstration. While for hash map we need to save the whole word
@@shuang790228 we need to store top10-20 suggestion per string and we have some limited number of top use string.We can accommodate all in hash as storage is cheap.trie in distributed setup is way too difficult to manage
In case of pre computation of all the words from a prefix, its better if you can cover the time complexity of whenever there is any update in the ranking of the words, as there will be change in the ranks leading to perform update queries on the entire Trie and Hashtable as well. Read I understood will be fast :) . By the way, you are doing great job, very helpful , Looking to hear from you soon ..... :)
Hey Great work man!!...but if we store all the auto-complete suggestions in hashtable then wont that be too much space...considering the combinations which can comeup. Also at a later stage if we decide to save 10 instead of 3 suggestions...won't that be an overhead to re-compute all the existing records
Agree. Also the hash table *isn't faster*. You need to compute the hash of the string which takes O(L) time, which is the same complexity as the trie. (L = length of the prefix)
What if your Redis cluster goes down? How are you going to reconstruct the data? You also never mentioned about the service which is going to write to redis or DB. Where you are exposing the API for the browser to get prediction string from the system?
maybe we can have trie for initail build but it is not needed.Anyhow u will have to add all prefixes to hash, now given a word, we can find it prefixes and directly map to prefix hash table without trie
Why not use a text search DB like Elastic Search and cache the results in Redis. Both Elastic Search and Redis is horizontally scalable? What if interviewer asks me why are we reinventing the wheel?
I like how you have used tree data structure to elucidate conceptual framework that can then be implemented in any platform. Judging from comments below, some critics didn't get it :) You did awesome!
thanks for wonderful video, can you please tell that do we need double capacity RAM as we need to store trie also in memory and prefix hash table also in RAM?or did i miss anything?"
I am confused about the trie DS. We store only the letters with wordFlag-true/false or we are storing the whole word in a node? like D(flag=false)->O(flag=false)->G(flag=true) instead of D->DO->DOG
Yeah, am completely thrown off by the explanation & DS used. This may be a tree, but it is not a TRIE data structure. Also, the time complexity explained for O(N) - that's not how time complexity is calculated at all. There is no input n. TS or SC should be computed based on the input params.
Hey, I think it should be better if we directly use prefix hash table,It can be build directly,like if i have "Hello", we can hash to H,HE,HEl,HELL,HELLO. I dont see usage of trie. Since we are storing top10-20 suggestions with each word.It should be fine. I am not in favour of trie because: 1. Difficult to distribute 2. Difficult to update ,what if we add new keys, we have to update all the nodes, same can be done in hash. 3. Trie can be Single point of failure, replication and all can be done easily if we go with NO-SQL. We can store top words of prefix hash table in redis and set expire to say 10 mins,and also back data in No-SQL or my sql at back end as persistent store.Each node score will be updated at back end and if new data is coming it will reflect in next 10 mins or we can set it to 1 hour.Its too easy to update latest results that using trie and making it complex.Let me know your thoughts.
Please also add the discussion related to how the ranking of the word should change.. You can not keep your precomputed table as it is all the time.. The ranking will be updated whenever a new query hits the table, and updating it that time, will be quite tough.. I assume that will surely come into discussion if this question is asked in an interview.
One suggestion for you...please turn off the background music...it actually is a little distracting....noticed some other people requesting this as well
Regarding the time complexities, assuming L is the length of the prefix, and N is the number of nodes under the prefix node, and K is the number of suggestions we want, shouldn't the time complexity for sorting be Nlog(K)? Since we will be scanning through more nodes than just K nodes. Please correct me if I'm wrong.
Questions: 1 - How are we going to persist the trie, if in memory, how do we make it resilient to failures (power) without downtime? 2 - Are we going to store the entire Trie on a single node, if not how can we split it? 3 - How is the ranking and precomute built, and how does it get updated, doesn't require full details but how the trie would be updated when it receives new data.
Question: Does it mean that we maintain a trie for each english alphabet? or a trie for a range of alphabets? or just one trie? Theoretically is it possible to maintain 1 trie? (ignore Scalability) in which case where does the search start (since root of the trie starts with a single alphabet)? I am not sure why all the videos take an example of a single word rather than a phrase! Thank you!
How does this solution scale for actual auto complete which uses phrases/sentences instead of single word searches? Using a trie for phrases is incredibly inefficient for phrases
hm, We are not doing any search here, so I guess it's still efficient with trie's time complexity. and moreover, we are using the prefix table to store in-memory to autocomplete in o(1). open for any suggestions here.
Thank you for the video. However, you have plunged into the detailed features without giving a problem statement. I am able to guess what a type ahead is but should you not frame the details on a broad problem statement? This is tantamount the interviewee to plunging into the details without obtaining a description of the problem to be solved.
Thanks for sharing your knowledge. Could you please make a video on tools like beyond compare or meld where differences in two files are highlighted.Thanks :)
What architecture we shall take on the frontend side ? Shall we call our api server after each prefix/letter is input or we can follow a design to minimize the number of api calls ?
Hi. Thanx for the explanation. However I have 1 doubt. When you tried saving top 3 child nodes of every node as a list, why did you select only leaf nodes? As in for D node, 3 child prefix nodes with ranking should have been {DOG, DOLL, DOGE}. Isnt it? Pls let me know if I missed something...
Correct me if i am wrong, ff the ranking to the words is user specific and not a global one , shouldnt that make the TRIE a user specific one, considering a large user base, it doesnt seem a good fit.
A certain user might have some affinity/more liking towards some feature variables, like category, geographical area, gender pref. etc. so basis this input feature variables we can add some weightage in the trie nodes and calculate results correspondingly.
The time complexity of searching all words for a particular prefix is wrong, it should be exponential. At each node for a latin alphabet you will have up to 26 children whereas each of them will have up to the same amount children nodes which gives us exponential complexity of traversing the trie.
Where is the ranking coming from, another team in the company? Although the pre processing does makes things readily available, is this viable? I mean google has billions of searches and preprocessing each and every one of them, setting up ranked suggestions for each sounds like a monolith task.
I don't know how Google does it because they have a lot of things which influence rank, also all of these tales need to be updated frequently as well but for this example, you don't need to do anything special for preprocessing as you can save top n results in constant time on the insertion itself. You only need to check a new entry if it clears the existing rank. If it does, it just replaces the entry with the one with the last rank.
I have a DOUBT: if we are precomputing all prefixes top sugesstions, then wouldnt it take lot of space?? for eg: for the word DONT we have {D, DO, DON, DONT} prefixes i.e we have n prefixes for a word (n = no. of character), so if there are 10K words with X different characters wouldnt it be too much to store?? Someone please help me with this?? is this how it works?? @Tech Dummies
Seems like most of system design is just re-using the same diagram and a lot of hand waving. Always some LB, Server, multiple DBs, caching. How is this supposed to be tricky at all? The Trie structure was the only specific detail, but if you do enough LeetCode then this is also obvious.
Still kinda confused, what is the point of prefix # table? Hashing of the string is O(L), meanwhile searching for node in trie is O(L) Or is the benefit in using distributed hash tables?
Hi, I have seen other of your videos which are nicely explained but this was a bit disappointing for me as in this video you have deviated from the goal of providing a proper system design diagram to just an explanation on how to make the search efficient. It would be nice if you could add, after you finished explaining the efficient way to retrieve suggestions for the autocomplete, the proper system diagram along with explanation of what each component should do what. Thank you
@2:07 - you said "Third feature" is the number of results we are expecting/sorted results, and then proceeded to write 5. Is this an example of "Type ahead" writing because you were thinking of saying "5 results"? :)
Actually wait.. wouldnt this dictionary be MASSIVE for someone like google? how would you even begin to build that trie if you had result sets on the scale they do?
1> Full replication, 2> partial replication(e.g. data specific to a client or a geographical location at one node) and 3> combination of both, all three are used based on the type of application. A big application like RUclips would use hybrid sharding( the third type). Just Google 'database sharding and partitioning'.
Thanks a lot , really wounderful , i have a DS question for you , please help me ......Have one DS question, need anser. Q : we have a city , in which we have locations, each location has few restaurents. One ( ex: KFC) restaurent might be in multiple locations( like HiTechcity, Nampally etc). To Given restaurent get all its locations, given location display all restaruents in that location. What Data Structure use this problem?
You can do it in two ways. 1. Use Maps and build a hierarchy data, where parent node will be place/state/location and child nodes will be restaurants . You also need to have one more Maps which keeps restaurants as parent and location as children/leaf node. On addition or deletion of items in graph , you need to updated both the maps. 2. Or other way is you can use Graph to solve this. have locations and restaurant as nodes of the graph. do BFS to get the answers to you queries and you need to save the reference to location entry node and restaurant entry node somewhere. Refer to this diagram: imgur.com/RG5hgSX
What about using two maps? one for Restaurant Key: RestaurantDetail. And the Other storing Location Name: List . This way we can differentiate between a location say Marathalli having multiple KFCs!
Probably one of the best videos for Typeahead Suggestion service across all the RUclips even in 2024. Keep up the good work!
Thank you very much. this is cool. btw, I'm a little bit distracted by the small background music since I'm wearing headset. It'd be great if you turn off background music.
Very nice and detailed explanation...thanks a lot! Just wanted to extend a bit, while precomputing you don't have to scan the entire subtree for all the nodes, just start doing recursively from bottom-up, for each node collects the top 3 suggestions form it's children's suggestions. An extended merge sort(instead of two arrays, it will run on many sets, one for each child) which will reak once you have collected top n. Whenever an update comes, you will have to update all the prefixes for the updated word, in insertion if the ranking of a word is higher than the lowest one, replace the lowest one, in deletion see if the word present in the current node is deleted, replace one with the next ranking recursively for the entire subtree again go from bottom-up.
Thank you Narendra, very detailed, yet simple to understand. Its an art to teach that way.
I feel the important part is data collection and how it's going to be updated in the trie.
Similarly how are we going to scale trie nodes -> sharding technique to split the data into multiple trie nodes
Caching at a trie level, probably using heap to store top k results and how redis or distributed cache is going to be updated.
It would have been good to talk about how do we store the Trie, how do we scale it, how do we rank the words and update the Trie. The content is far from being complete. Keep going ! ;)
Lol you are an ungrateful person.
Here are my thoughts gathered after reading through some articles
1. Tries must be in-memory. A batch task should be there to backup inmemory trie to file store as json format or any serializable format, to recover incase of node failure. People also suggested of using graph db's like neo4j as a recovery backup.
2. Scaling up: I will talk about the truecaller usecase. Users phone number could start with +91 for India, +44 for UK. A BSNL SIM card starts with +91-82. May be these prefixes could be used to Load Balance the data set across multiple instances for better sharding. Scaling should be done based on memory usage of the active services.
3. In the case of Truecaller, they might be using Ternary Search Tree, spaning accross multiple nodes.
qr.ae/pGrmaL
Please do share your thoughts or points that can add on to this.
@@StockDC2 for someone who is starving, a loaf of bread would be great. For someone who is living a rich life, a loaf of bread is nothing. I hope you understand the analogy.
Yes this is far from complete but I appreciate his effort
I agree! This is far from complete. This will not work in an interview setup.
Time complexity of finding top K nodes is N LOG(K) and not K LOG(K). N because we got to through all possible nodes and they will be N at most (11:44).
You could have added database sharding into the scope. Complete TRIE might not fit in one database server.
Very nicely explained.
You have excellent command on your voice, each and every word is clearly spoken.
You explain concepts very clearly.
Diagrams are clear and you do not rush into concepts.
Narendra great job.
Some important questions to think about :
1. Load balancing strategy
2. Caching strategy
3. How would you modify the frequency without compromising availability or increasing latency
4. Fault tolerance of the system - How would you store the already built trie, so that in case of failure system can be up soon
5. What would be the increment factors (per day increase) for the Trie?
6. Any other optimisation on client side ? What about connectivity and session start (means when will you call the suggestion API)
I really liked the last part, where he talked about different sources of suggestions
Very useful video.. Thanks for uploading this. BTW, in the precomputing step, we can find the result of all children of a node then get the results of patent from that. I think that will make it a little faster
Can we still store the frequency/priority of each word next to it, and at the parent have a reference to top 10 nodes (by priority). This saves us a lot of space that the hash table takes. Also, we can maintain dynamic updates to priority (based on frequency) by going going back from the child nod towards the parent, each time updating their respective List of pointers (Or priority Queue)
I love this guy. He Makes things simple where it is not!
Nice .But how will we distribute trie in case we have so much of data that cant fit in one server?
best ever video on type ahead search
Thank you, very nice video. A quick question, the time complexity of selecting top k type aheads from n nodes should be nlogk, right? I assume we use min heap
You know how awesome you are. I love the quality of the material. I hope you never compromise on it. Please keep it coming and don't lose motivation. :) I am definitely a fan.
Hi Naren,
Thanks for uploading your videos. Great content and it's very clear.
I have got a small suggestion in this video, we don't have to sort once we identify the potential prefix candidates, lets's say we have found "n" candidates which are leaf nodes, then we could do max heap of k candidates out of ''n' candidates which would take O(klog(n)).
Once again appreciate your effort.
-Raja
your solution is pretty much building the suggestions based on the historical data. how would you handle the suggestion update in a more scalable way? The search key words can be more than 100 characters long, updating all the nodes in a single trie tree and its corresponding hashtable is not realistic, considering the traffic we are having is on the google or youtube level.
How to update the rank in runtime. For example if DOGE was resulted -> rank changes on all nodes. How to take care of that
Thanks for the video. Question:
how is Trie persisted? Can it be recreated from the prefix hash table? Another question is about the rank of nodes. how is that updated?
Nice work Narendra. The conceptual framework that you initially came up with and then went on to enhance it further was really impressive. Very easy to understand for anyone who wants to understand the mechanics of type-ahead. thank you for your valuable contribution.
excellent explanation sir.. great job.. thank you for your time and effort
Man, you are a very good teacher!
Thank you very much.. My learning curve got changed after i started watching your videos.
Oh man, this video came 2 months after google asked me this in a interview. My answer was very similar, and I even explain how the table could be updated to reflect the changes on weights during a day. The interviewer sounded and looked so unimpressed, that to this day I wonder what did I do wrong. It is amazing I find this today and realized I was basically right.
I hate how much of interviewing depends on your interviewer.
I guess Google might have something different and they were looking for that exact answer LOL
For the non-precomputing approach, I think the time complexity is incorrect on klgk part. You'll have to sort all nodes that are valid which is n, in your example it just happen to be n = k =3. If you sort all nodes at last like your approach, it would be nlgn; but we can use priority queue which gives nlgk. For the precomputation approach, updating the frequency or adding a new word seems pretty expensive?
can you not add the background music, it throws me off some times. otherwise thanks man!
Why do you need a tree? just store the prefix table in redis,or cache only top prefixes and rest in DB. You have ranking scores upfront
I think in order to compute the prefixes to actual words, we'll use trie since it's fast. to store with ranking mechanism, we'll use prefix hash table
Pooja Singh I don’t think trie is faster than hash table, searching in trie takes O(L) and hash is O(1), but it takes less space than hash. However, he stores top suggestions in every node, so there is no space saving
there is still a minor difference. Actually for trie structure, we save only one character, no need to save the whole word. I guess the author of this video wrote the whole word on each node for the purpose of demonstration. While for hash map we need to save the whole word
@@shuang790228 we need to store top10-20 suggestion per string and we have some limited number of top use string.We can accommodate all in hash as storage is cheap.trie in distributed setup is way too difficult to manage
In case of pre computation of all the words from a prefix, its better if you can cover the time complexity of whenever there is any update in the ranking of the words, as there will be change in the ranks leading to perform update queries on the entire Trie and Hashtable as well. Read I understood will be fast :) . By the way, you are doing great job, very helpful , Looking to hear from you soon ..... :)
Suggestions considered. Thanks
Awesome work Narain...
Good for getting associate level job can’t expect to get architect role with this guys
Hey Great work man!!...but if we store all the auto-complete suggestions in hashtable then wont that be too much space...considering the combinations which can comeup. Also at a later stage if we decide to save 10 instead of 3 suggestions...won't that be an overhead to re-compute all the existing records
Agree. Also the hash table *isn't faster*. You need to compute the hash of the string which takes O(L) time, which is the same complexity as the trie. (L = length of the prefix)
What if your Redis cluster goes down? How are you going to reconstruct the data?
You also never mentioned about the service which is going to write to redis or DB.
Where you are exposing the API for the browser to get prediction string from the system?
maybe we can have trie for initail build but it is not needed.Anyhow u will have to add all prefixes to hash, now given a word, we can find it prefixes and directly map to prefix hash table without trie
Why not use a text search DB like Elastic Search and cache the results in Redis. Both Elastic Search and Redis is horizontally scalable? What if interviewer asks me why are we reinventing the wheel?
This video is to show how the wheel works!!
Thank you for your work, great job!😌
I like how you have used tree data structure to elucidate conceptual framework that can then be implemented in any platform. Judging from comments below, some critics didn't get it :) You did awesome!
I think it's a trie, not a tree. Correct me if I am wrong
At the end it is not O(L) where L is lenght of the prefix. You are also traversing for whole words related to the prefix
thanks for wonderful video, can you please tell that do we need double capacity RAM as we need to store trie also in memory and prefix hash table also in RAM?or did i miss anything?"
How would we handle this if we are autocompleting based on sentences? The trie would get huge if we're keeping track of each and every character
I am confused about the trie DS. We store only the letters with wordFlag-true/false or we are storing the whole word in a node?
like D(flag=false)->O(flag=false)->G(flag=true) instead of D->DO->DOG
Yeah, am completely thrown off by the explanation & DS used. This may be a tree, but it is not a TRIE data structure. Also, the time complexity explained for O(N) - that's not how time complexity is calculated at all. There is no input n. TS or SC should be computed based on the input params.
Hey,
I think it should be better if we directly use prefix hash table,It can be build directly,like if i have "Hello", we can hash to H,HE,HEl,HELL,HELLO. I dont see usage of trie. Since we are storing top10-20 suggestions with each word.It should be fine. I am not in favour of trie because:
1. Difficult to distribute
2. Difficult to update ,what if we add new keys, we have to update all the nodes, same can be done in hash.
3. Trie can be Single point of failure, replication and all can be done easily if we go with NO-SQL.
We can store top words of prefix hash table in redis and set expire to say 10 mins,and also back data in No-SQL or my sql at back end as persistent store.Each node score will be updated at back end and if new data is coming it will reflect in next 10 mins or we can set it to 1 hour.Its too easy to update latest results that using trie and making it complex.Let me know your thoughts.
Please also add the discussion related to how the ranking of the word should change.. You can not keep your precomputed table as it is all the time.. The ranking will be updated whenever a new query hits the table, and updating it that time, will be quite tough.. I assume that will surely come into discussion if this question is asked in an interview.
One suggestion for you...please turn off the background music...it actually is a little distracting....noticed some other people requesting this as well
Change your phone
That music is my jam like there's no tomorrow 🎵
I didn't notice until you mentioned, but I really like the music!
I really found the bass line to be soothing in particular.
Great content, appreciate the effort!
Regarding the time complexities, assuming L is the length of the prefix, and N is the number of nodes under the prefix node, and K is the number of suggestions we want, shouldn't the time complexity for sorting be Nlog(K)? Since we will be scanning through more nodes than just K nodes. Please correct me if I'm wrong.
Yes
Questions:
1 - How are we going to persist the trie, if in memory, how do we make it resilient to failures (power) without downtime?
2 - Are we going to store the entire Trie on a single node, if not how can we split it?
3 - How is the ranking and precomute built, and how does it get updated, doesn't require full details but how the trie would be updated when it receives new data.
Question: Does it mean that we maintain a trie for each english alphabet? or a trie for a range of alphabets? or just one trie? Theoretically is it possible to maintain 1 trie? (ignore Scalability) in which case where does the search start (since root of the trie starts with a single alphabet)? I am not sure why all the videos take an example of a single word rather than a phrase! Thank you!
This video is just about Trie data structure. Till 22 mins discussion is just about trie DB, skip if you are looking for high level system design.
I think the top RUclipsrs for all interviews are:
Narendra, Guarev Sen, Aditya Verma
Thanks for the amazing video! 1 suggestion: turn off the bg music, it's distracting.
How does this solution scale for actual auto complete which uses phrases/sentences instead of single word searches? Using a trie for phrases is incredibly inefficient for phrases
hm, We are not doing any search here, so I guess it's still efficient with trie's time complexity. and moreover, we are using the prefix table to store in-memory to autocomplete in o(1). open for any suggestions here.
@@TechDummiesNarendraL I believe it is the same just treat words like letters in this algorithm
I should have bought DOGE at this point, the message was clear
Thank you for the video. However, you have plunged into the detailed features without giving a problem statement. I am able to guess what a type ahead is but should you not frame the details on a broad problem statement? This is tantamount the interviewee to plunging into the details without obtaining a description of the problem to be solved.
Thanks for sharing your knowledge. Could you please make a video on tools like beyond compare or meld where differences in two files are highlighted.Thanks :)
It's pretty much Levenstein distance algorithm.
Javascript implementation gist for Trie: gist.github.com/orberkov/4a31d2260a285beaaa8d8b268f6be682
What architecture we shall take on the frontend side ? Shall we call our api server after each prefix/letter is input or we can follow a design to minimize the number of api calls ?
You're suggesting memcache to store the hashtable of all possible prefixes you're server will handle??
Thank you for taking the time to share your knowledge.
Why are we storing data in hashtable when we have a trie?? What is the use of trie then?
Hi. Thanx for the explanation. However I have 1 doubt. When you tried saving top 3 child nodes of every node as a list, why did you select only leaf nodes? As in for D node, 3 child prefix nodes with ranking should have been {DOG, DOLL, DOGE}. Isnt it? Pls let me know if I missed something...
Correct me if i am wrong, ff the ranking to the words is user specific and not a global one , shouldnt that make the TRIE a user specific one, considering a large user base, it doesnt seem a good fit.
A certain user might have some affinity/more liking towards some feature variables, like category, geographical area, gender pref. etc. so basis this input feature variables we can add some weightage in the trie nodes and calculate results correspondingly.
The time complexity of searching all words for a particular prefix is wrong, it should be exponential. At each node for a latin alphabet you will have up to 26 children whereas each of them will have up to the same amount children nodes which gives us exponential complexity of traversing the trie.
Can you explain how we can merge data from all three tries having user context, trending data and product info in memory?
Where is the ranking coming from, another team in the company? Although the pre processing does makes things readily available, is this viable? I mean google has billions of searches and preprocessing each and every one of them, setting up ranked suggestions for each sounds like a monolith task.
I don't know how Google does it because they have a lot of things which influence rank, also all of these tales need to be updated frequently as well but for this example, you don't need to do anything special for preprocessing as you can save top n results in constant time on the insertion itself. You only need to check a new entry if it clears the existing rank. If it does, it just replaces the entry with the one with the last rank.
Can you please discuss design for Amazon.com or flipkart.com? or any ecommerce site? Please.
@ 1:03 FB says their response time is 100 ms or 300 ms ? It wasn't clear. Could you please clarify?
Great one! Keep up the good work
Please use microphone to avoid noice, audio is not great but content is superb, Thanks
What about misspelled words?
Do you prefer graph database for these kinds of work?
Great explanation !! Keep up the good work.
Great explanation but if you could improve audio quality it would be excellent.
You can't explain how trie works on system design interview. They will ask you to move on.
It's a perfect technique to fail the interview. Not to focus on P1 requirement and lose the track into some other details :D
How does google generate and suggest phrases (not just one token)? Most of the system designs I've seen only talk about one token.
Thank you for uploading videos. Your channel really helps me a lot! Brilliant content! Keep it coming. Big fan! :)
I have a DOUBT: if we are precomputing all prefixes top sugesstions, then wouldnt it take lot of space?? for eg: for the word DONT we have {D, DO, DON, DONT} prefixes i.e we have n prefixes for a word (n = no. of character), so if there are 10K words with X different characters wouldnt it be too much to store?? Someone please help me with this?? is this how it works?? @Tech Dummies
are we comprising on trie when we are building hash table ?
or we will keep both . If new node is added how it will update the table.
Thanks for the talk. How do you suggestion that we fit the TRIE into the row based database (24:30)?
I think he means that you use the trie to populate the hash map and store that in Cassandra :)
Well explained sir. Could you please make a video on "stackoverflow" system design ?
Seems like most of system design is just re-using the same diagram and a lot of hand waving. Always some LB, Server, multiple DBs, caching. How is this supposed to be tricky at all? The Trie structure was the only specific detail, but if you do enough LeetCode then this is also obvious.
Still kinda confused, what is the point of prefix # table?
Hashing of the string is O(L), meanwhile searching for node in trie is O(L)
Or is the benefit in using distributed hash tables?
IMO it's because tries are much more memory-efficient than using hash, where you have to store all possible strings
What I see too much space consumption by trie? is that right man ?
great job
Awesome video, thanks!
Thankyou, this was a great explantion
Why is there music????
thanks for your efforts bhai...
Hi, I have seen other of your videos which are nicely explained but this was a bit disappointing for me as in this video you have deviated from the goal of providing a proper system design diagram to just an explanation on how to make the search efficient. It would be nice if you could add, after you finished explaining the efficient way to retrieve suggestions for the autocomplete, the proper system diagram along with explanation of what each component should do what. Thank you
@2:07 - you said "Third feature" is the number of results we are expecting/sorted results, and then proceeded to write 5. Is this an example of "Type ahead" writing because you were thinking of saying "5 results"? :)
Great stuff, thank you.
Thanks a lot for this video
Actually wait.. wouldnt this dictionary be MASSIVE for someone like google? how would you even begin to build that trie if you had result sets on the scale they do?
Partition by root levels?
@@TechDummiesNarendraL oh yeah you are totally right, it seems obvious now that I think about it
When we tell multiple nodes for DB and Redis, does it mean the entire Database data will be copied to each node ?
1> Full replication, 2> partial replication(e.g. data specific to a client or a geographical location at one node) and 3> combination of both, all three are used based on the type of application. A big application like RUclips would use hybrid sharding( the third type). Just Google 'database sharding and partitioning'.
Thanks a lot , really wounderful , i have a DS question for you , please help me ......Have one DS question, need anser.
Q : we have a city , in which we have locations, each location has few restaurents.
One ( ex: KFC) restaurent might be in multiple locations( like HiTechcity, Nampally etc).
To Given restaurent get all its locations, given location display all restaruents in that location.
What Data Structure use this problem?
You can do it in two ways.
1. Use Maps and build a hierarchy data, where parent node will be place/state/location and child nodes will be restaurants .
You also need to have one more Maps which keeps restaurants as parent and location as children/leaf node.
On addition or deletion of items in graph , you need to updated both the maps.
2. Or other way is you can use Graph to solve this. have locations and restaurant as nodes of the graph.
do BFS to get the answers to you queries and you need to save the reference to location entry node and restaurant entry node somewhere.
Refer to this diagram: imgur.com/RG5hgSX
What about using two maps? one for Restaurant Key: RestaurantDetail. And the Other storing Location Name: List . This way we can differentiate between a location say Marathalli having multiple KFCs!
He told use only one ds.
Rtree is best for this kind of problem.
Can you please elaborate a bit
It was more of algo interview than system design
I didnt get how the time complexity is calculated here
Great work Buddy... Keep it up..!!
Hi, thanks for another great video, content is awesome. The sound quality is low quality.