System design : Design Autocomplete or Typeahead Suggestions for Google search

Поделиться
HTML-код
  • Опубликовано: 6 ноя 2017
  • System design: How to design an autocomplete feature for search engine like Google or Bing. Design should be scalable/available/durable.
    / tusharroy25

Комментарии • 282

  • @KanagaveluSugumar
    @KanagaveluSugumar 6 лет назад +4

    You rock as usual, I can see your improvements from solving a problem from single machine to a distributed machines techniques, like zookeeper, load balancer, aggregators... Keep teach us these we are also forwarding with you! Again thanks for the TIPS like CDN and caching future items before the requests.

  • @anssha2643
    @anssha2643 6 лет назад

    Its really great to see you back with the design questions.

  • @konstantinchernov1477
    @konstantinchernov1477 5 лет назад +1

    Thanks for all the great work you do Tushar, it's always a pleasure to see your amazing yet humble videos.

  • @biswajitsingh8790
    @biswajitsingh8790 6 лет назад +147

    the legend is back 😍😍

    • @CALVlNXD
      @CALVlNXD 6 лет назад +10

      i forgot what love felt like

    • @mohanish35
      @mohanish35 2 года назад +3

      And he disappeared again

  • @ksankitha
    @ksankitha 6 лет назад +23

    your videos are much awaited! Please keep them coming

  • @divyeshgaur
    @divyeshgaur 5 лет назад +1

    Hi Tushar, I want to say thank you for all your beautiful work. Keep them coming.

  • @arthamsreenivas8858
    @arthamsreenivas8858 4 года назад +1

    This is great video which covers both pre-processing data flow as well search, great job Tushar bhai!

  • @kylcross
    @kylcross 6 лет назад +47

    I’m loving these new videos. You’re obviously an amazing human being. I’m a little saddened that you’re into the nfl but I think I’ll survive. Thanks Tushar!

    • @tusharroy2525
      @tusharroy2525  6 лет назад +11

      you are not nfl fan?

    • @kylcross
      @kylcross 6 лет назад +4

      You replied! I'm NBA/WNBA all the way. I have a hard time with how dangerous football is. But in Seattle you don't have NBA so I can't blame you too much! Thanks again Tushar!

    • @ulanbayaliev6831
      @ulanbayaliev6831 4 года назад +2

      @@tusharroy2525 In my mind you went to the weekend game and started making 'system design' video right after you are home. Respect for engineers like you!

  • @rahilshaikh5181
    @rahilshaikh5181 5 лет назад +1

    Crisp and on point videos. Thanks for delivering high quality content :)

  • @syedghufranhassan3858
    @syedghufranhassan3858 4 года назад

    This is my first time that I am watching you and you explanined whole design in very easy and convenient way. Count me in as your fan :)

  • @nikkis8102
    @nikkis8102 6 лет назад

    You are back!!! YAY!!! Thank you!!!

  • @shirishtekale549
    @shirishtekale549 5 лет назад +1

    You really are giving a precious knowledge to all of us, i am greatly grateful to you on all your help and time.

  • @mhimanshu91
    @mhimanshu91 6 лет назад

    Great! Out of the box! Thanks Tushar 🤗

  • @myhinditech8723
    @myhinditech8723 6 лет назад +2

    Very helpful videos and I love it and I appreciate it on the RUclips!

  • @mehdikhosravi
    @mehdikhosravi 5 лет назад +1

    Truly amazing content. Watching your videos is significantly adding to my knowledge.

  • @rembautimes8808
    @rembautimes8808 3 года назад

    I watched several autocomplete tutorials and this was the most conceptually complete in terms of coverage e.g using time as weight.

  • @HXYZZZ
    @HXYZZZ 4 года назад

    Excellent video. Thanks Tushar for sharing

  • @wenbobu8314
    @wenbobu8314 6 лет назад +18

    Nice video!
    One suggestion: can we also talk about single point failure and possible solutions for that? This topic seems to help a lot on making design decisions and weighing tradeoffs.

  • @priyanksaxena2196
    @priyanksaxena2196 6 лет назад +1

    This is awesome!! Thanks Tushhar for these system designing videos. I was waiting for them for long time. Are you from Seattle? Your shirt says that. 😊 I was in Redmond for 1.5 yrs. now moved to Portland. Preparing for interviews in Microsoft and Amazon. Want to come back there. So, thank you again for these wonderful tutorials.

  • @indiansoftwareengineer4899
    @indiansoftwareengineer4899 5 лет назад +1

    thanks, tushar sir. you are doing great work. please give us more knowledge.

  • @ahmdalgendi
    @ahmdalgendi 10 месяцев назад

    Thank you tushar, you were my number 1 guy to learn data structures and algorithms, now after graduating and working i am getting back to you to learn about system design
    Again thanks a lot

  • @rssatvikreddy
    @rssatvikreddy 6 лет назад +1

    hey tushar! i'm new to system design and have an interview tomorrow. This video saved me man. Very clear and precise. Keep up the good work!!!

  • @sandeephalder7536
    @sandeephalder7536 6 лет назад

    Tushar rocks!!!
    The implementation with RNN will be smarter.

  • @hlibpylypets1333
    @hlibpylypets1333 2 года назад

    As always, you dig into the details with your explanation. Very helpful

  • @naikkunjan
    @naikkunjan 5 лет назад

    Great insights and explanation

  • @aeon327
    @aeon327 6 лет назад

    Thank you for taking your time to break these things down. This is helping me a lot to prepare for design interviews and helping me learn how to design scalable systems.

  • @gauravkushwaha2009
    @gauravkushwaha2009 6 лет назад

    Amazing explaination.

  • @umangmalhotra1222
    @umangmalhotra1222 5 лет назад

    Hello Mr. Tushar. This is a satisfying video. Thanks.

  • @hindifunnycalls8081
    @hindifunnycalls8081 6 лет назад

    very good video about Design Autocomplete or Typeahead Suggestions for Google search I love it. The teaching quality is awesome!

  • @balakrishnan3725
    @balakrishnan3725 3 года назад

    Nice video. Giving some tips to make the design better is even better. Thank you Tushar!

  • @raghavendarreddyn9903
    @raghavendarreddyn9903 6 лет назад +22

    when you say trie nodes, are these created in the application layer or the back-end(lets say a relational database)? Can we store data in a trie data structure in the database?

  • @hurdurlemur1615
    @hurdurlemur1615 3 года назад +1

    love this my dude
    thanks for your videos

  • @arunabhhejib1136
    @arunabhhejib1136 6 лет назад +35

    Hi Tushar, Could you please make design videos on some of the common questions like Elevator, bookmyshow, social media website like Facebook, etc. Looking for things like how do we process the request, class diagrams and the type of database, etc used. Also, I wanted to thank you for all your videos, have watched many of them and learnt a lot.

  • @veronicalamb5330
    @veronicalamb5330 5 лет назад

    Thank you so much for such a clear explanation!

  • @boredhuman23
    @boredhuman23 6 лет назад +1

    Thanks, very informative. I just understood simple request flow and server distributed caching. 😊

  • @md.abdullahal-alamin8059
    @md.abdullahal-alamin8059 5 лет назад

    Brilliant as always. thanks

  • @gdthangakumar
    @gdthangakumar 6 лет назад

    Thanks Tushar. You are videos helps a lot :)

  • @angappanganesh
    @angappanganesh 6 лет назад

    Amazing stuff man!

  • @infraops5184
    @infraops5184 6 лет назад

    Great Video

  • @mayurmahajan1503
    @mayurmahajan1503 4 года назад +2

    I used to come to Ashish sir's class in Seattle and you used to teach us there. There used to be another person teaching along with you, who kinda looked like an angry lad lol. I learned a lot from those weekly sessions. Please make more videos. Thanks for the great work!

  • @EbrimaTunkara
    @EbrimaTunkara 6 лет назад +5

    I think for distributed storage and cache, Redis could be useful in which the trie data can be partition into multiple shard redis servers . No need for zookeeper in maintaining the trie metadata which is bit a complicated and creates some overhead

  • @surabhigoyal9504
    @surabhigoyal9504 4 года назад

    Amazing Video!

  • @dibyajyoti_ghosal
    @dibyajyoti_ghosal 12 дней назад

    Why do you not make any more videos? These videos are literally gems! I wish you had made more videos on the system design!

  • @nitishkumar-yi9yi
    @nitishkumar-yi9yi 6 лет назад +1

    Hi Tushar. It's a great video you have come up with. A big thumbs up for this. I hope that the splitting of the data in zookeeper is done with help of indexing techniques like SOLR. It would be really great if you can do a video on indexing or suggest some good links for the same.

  • @benleung6331
    @benleung6331 5 лет назад +1

    A couple questions I have
    1) From what I understand N1, N2, ...are replicated server nodes to improve the scalability of the system and T1, T2, T3 are replicated trie data structure to implement the auto-complete feature, but do they occupy in two separate sets of physical machines or could each of the server node has its own of trie data structure. What is the pro and con of two different set up?
    2) You mention about optimizing the system by leveraging CDN, where would the CDN sit in the system diagram? Between the client machine and the load balancer or between the load balancer and the N1, N2..server nodes. If it is the latter, how is the CDN different from the server nodes themselves?

  • @GDBthe1st
    @GDBthe1st 6 лет назад

    Thanks this is a great video and very nice ideas

  • @bishwajitpurkaystha3075
    @bishwajitpurkaystha3075 5 лет назад +13

    Hi Tushar dada,
    I liked your solution.
    One thing I wanted to mention is that the cache and the zookeeper are the single point of failures in the proposed design. For caches, we could cluster Redis (instead of memcached), and make them highly available by introducing redundancy.
    We could add more passive zookeepers as well.
    Let me know how you think of it.
    - Bishwa

  • @InfogramTV
    @InfogramTV 2 года назад

    Hi Tushar,
    Very neat explanation.
    Thanks.

  • @pradiplamsal1403
    @pradiplamsal1403 3 года назад

    Nice, simple explanation. Thank you very much.

  • @mohanrao487
    @mohanrao487 5 лет назад

    Nice One !

  • @aaronkatz9411
    @aaronkatz9411 3 года назад

    Thanks, great explanation

  • @MiztaHall
    @MiztaHall 3 года назад +1

    I'm working at a autocomplete feature. But I'm here mainly for the jersey. GO HAWKS!

  • @anmoljawa8367
    @anmoljawa8367 3 года назад +1

    More more more ... more videos on system design plzzzzzz !!

  • @victoralvarez3481
    @victoralvarez3481 5 лет назад +4

    Since you're sharding your distributed trie, you could also shard/resource intelligently, for example have a larger pool (and larger instances) for frequently accessed tries, since the access patterns definitely won't be uniform.

    • @andreitolkachev8295
      @andreitolkachev8295 3 года назад +1

      can't believe this doesn't have more upvotes! It's what we do too at work

  • @mohamadazmath
    @mohamadazmath 6 лет назад +1

    watched your tiny url and this video. thanks for sharing knowledge. the kind of application and companies used elastic search. request goes to LB, then to app server, server pull data from solr/lucene/elasticsearch/sphinx. this worked for us in terms of scale, latency, availability. data get refreshed every few hours based on the need. internally all these works based on inverted index.

  • @pingqiu7318
    @pingqiu7318 6 лет назад

    Hi, Tushar, your videos are awesome. Would you make more on system design? Thanks

  • @nishitgoyal970
    @nishitgoyal970 5 лет назад

    Great video Tushar..Can you put some video which is covering authentication concepts too..

  • @janarthananrajendran8528
    @janarthananrajendran8528 6 лет назад +2

    Thanks for the nice explanation. One question. wouldn't dumping the entire data into DB cause performance issues for both read and write ? Just a thought maybe acquire locks at node level that needs to be updated and update only those and the relevant parents to that node (top K items) ?

  • @siangjin3456
    @siangjin3456 6 лет назад

    For the data collection session, I think it would be good to use Hadoop to do some batch processing, then we can have the words with frequencies, the build the trie tree database from there.

  • @revanthkumar1183
    @revanthkumar1183 6 лет назад +3

    I would use a debounce in the search textbox handler and make calls only after a certain timeout. This network bandwidth and a lot of server cpu cycles.

  • @gabokings260388
    @gabokings260388 6 лет назад

    Good job 👍🏼, It would be great if you made videos about ML

  • @Gamabunta24345
    @Gamabunta24345 3 года назад

    This is a really good solution

  • @Chandan-io3jm
    @Chandan-io3jm 6 лет назад

    keep uploading videos they are great

  • @xuedong360
    @xuedong360 3 года назад

    great !!!!

  • @ThakurArjun247
    @ThakurArjun247 5 лет назад +1

    Cassandra stores timestamp for all the columns/rows (during insert/update) so basically we don't need to store the date time explicitly

  • @occo5877
    @occo5877 4 года назад

    This was super useful thanks

  • @nishithakur424
    @nishithakur424 6 лет назад +1

    sir ji...you are great
    🙏🙏🙏🙏🙏🙏🙏

  • @michaelhuston
    @michaelhuston 6 лет назад

    Great video! Could you share a little bit more about how an Applier batch job takes the phrases and weights in its range and turns them into the top K terms for each phrase? Thanks!

  • @saikishorer
    @saikishorer 6 лет назад

    Nice video Tushar. Really helped a lot to me.
    I have a small question,Is Trie structure(T1,T2,T3....) will be stored in the DB as a pieces or replicas of original. Eg: Zoo Keepr has a distributed mapping like T1 ->a to f.... So, T1 DB holds only the combinations starts from a to f?

  • @sanke7982
    @sanke7982 6 лет назад

    Tushar, may we have videos on Saga transaction pattern, session storage/caching as well as api key storage/caching? Anyways great work as usual.

  • @ST-nm7pr
    @ST-nm7pr 3 года назад

    dont let this great content distracted you from the fact that settle should have ran the ball

  • @zaheershaik
    @zaheershaik 6 лет назад

    Best video on Internet on practical usage of trie for real usescase :)

  • @KashiPrayag
    @KashiPrayag 4 года назад

    Legend

  • @ashutosh6060
    @ashutosh6060 5 лет назад

    I have to say your content is thorough and deep. I just don't like the way you spell words with 'T' :D but i can bear with that.

  • @mvsmurty
    @mvsmurty 6 лет назад +1

    Very nice explanation Tushar. I just would like to understand how we can store trie in Database in Redis? or do we maintain the entire trie in memory by with the help of typical trie operations whenever there is new entry to be added to the trie? Is there any other blog or suggestion available? Hope to hear from you.

  • @pathakshardul
    @pathakshardul 3 года назад

    Thanks for the video! Question I have is what are you using for persistent storage?

  • @shivroh7678
    @shivroh7678 6 лет назад +1

    Awesome, in the next design videos please provide more details which no-sql db or cache to choose in this condition.
    I mean which cache/db could be beneficial to have in this scenario over any-other.

  • @hehhehdummy
    @hehhehdummy 6 лет назад +2

    Nice jersey!!

  • @sirajmemon8801
    @sirajmemon8801 11 месяцев назад

    For the aggregation, I would probably go ahead with something like HBase and run map reduce as it is a typical "word (phrase) count" problem and we can store the aggregated data into a relational DB or even in a NoSQL one like Cassandra with a TTL.

  • @rashidaidun4880
    @rashidaidun4880 6 лет назад +5

    Great video using Trie structures. Thank you.
    In a real solution for companies using SOLR or Elastic Search, would they use the n-gram analyzer or pre-compute the best tries and use that as part of the index (instead of a custom trie structure and plumbing) ?

    • @tusharroy2525
      @tusharroy2525  6 лет назад +3

      You are right. Probably they are using SOLR or something like SOLR.

  • @wenlaizhang9017
    @wenlaizhang9017 5 лет назад +2

    For the zookeeper, we can still use load balancer here, why we choose zookeeper, it has some benefits ?

  • @chethan1391984
    @chethan1391984 4 года назад

    Thanks Tushar. Appreciate your video.
    1. Rationale behind choosing the technologies is missing ex: Zookeeper (also list alternatives)
    2. How did you arrive that the data retrieval needs caching ? Also how is cache invalidated when trie is built is missing.
    3. Instead of T1,t2,t3, there could be one more load balancer ?
    4. Applier complexity needs to be explained better. Its not simple to query all the words from a data store and build a trie.
    5. How do you replace the trie on live nodes needs better explaination.

  • @CALVlNXD
    @CALVlNXD 6 лет назад +7

    i knew my man Tushar was a seahawks fan

  • @yangmyfly
    @yangmyfly 4 года назад

    100% better than
    Grokking the System Design Interview

  • @anandchowdhury9262
    @anandchowdhury9262 6 лет назад

    off topic but the t shirt is cool!

  • @vardaanbajaj3181
    @vardaanbajaj3181 4 года назад

    Hi, thanks for the video. Can you please tell why we need to use distributed cache like redis in this case since the tries will also be in-memory only?

  • @Avinashkumar-fo2bu
    @Avinashkumar-fo2bu 3 года назад

    Hi Tushar bhaiya please upload all your data structure and algorithm video with test cases in hindi also.this will be useful for us for competitive examinations and interview

  • @joelami3136
    @joelami3136 6 лет назад

    hi,I really love your video! Thanks a lot! And i hope you can share some work you have done in your real work !

    • @tusharroy2525
      @tusharroy2525  6 лет назад +3

      +joe lami work on really interesting distributed system. But can't share anything without compromising my job or even getting fired.

  • @Ttj963
    @Ttj963 4 года назад +1

    Can you explain how splitting of ranges works in little bit more depth , like how we are move range between nodes? Also how applier sitting on a diff node can move the in-memory Trie on to another node , say T1? .

  • @p111calcutta1
    @p111calcutta1 6 лет назад +1

    Thanks Tushar for nice video. Had you thought about OOPS design questions like Design an elevator, chess game ?. Any suggestions for that

    • @tusharroy2525
      @tusharroy2525  6 лет назад +1

      +Piyush Tiwari Honestly not my kind of questions.

  • @rahulagarwal841
    @rahulagarwal841 6 лет назад +11

    I was asked in Microsoft Design round: Design a system to allocate memory for queues. Operations like create, enqueue , dequeue should happen in O(1). Memory efficiency should over 90%. Similar question was asked in adobe. Please make a video on memory allocation. I could not find much content about memory allocation on ytube.

    • @NachiketJoshiTheBloodMage
      @NachiketJoshiTheBloodMage 6 лет назад

      hello Rahul,
      can you please tell me what other questions did they ask to you?
      What DS question and how did you approach for this questiona too?

    • @skootergofast123
      @skootergofast123 5 лет назад

      Linked List Implementation should do for that

    • @skootergofast123
      @skootergofast123 5 лет назад +1

      keeping two pointers head and tail....enqueue at head and dequeue at tail

    • @vector4067
      @vector4067 5 лет назад +1

      The proper way to handle this question is to ask more questions before you approach the problem. Are the required memory same size? e.g 512 k per query?
      Or there are different size queries?
      The basic under laying approach is not to allocate and release memory randomly.
      Memory allocation can be expensive. Bottom level data structure should be memory pool with fixed size chunk memory which you allocate at initialization. and chop up based on the query size. You manage the the allocate and release manually by maintain a pointer to the valid memory address.
      If there are 3 types of queries, you should have 3 memory pools chopping up with 3 different sizes.
      On the upper level logic, yes, you do round robin with a queue.

  • @mostinho7
    @mostinho7 4 года назад

    Thanks!

  • @scotts4089
    @scotts4089 6 лет назад +2

    Question about the Zookeeper: so we are basically using zookeeper because there's no available data-structures provided by common distributed data storage correct? Then I am a bit curious what the system requirements for those T instances would be. In-memory Trie structure seems reasonable but in-practice would it be robust enough? Second, appliers would be some batch Spark jobs running in the background I suppose. If the sum of the weights can be calculated in some state-aware fashion, I am guessing using Apache Samza is also reasonable, instead of running batch jobs. What do you think?

  • @elachichai
    @elachichai 3 года назад

    To split the load between T1,T2,T3 - can (should) we have a 2nd layer of load balancing?
    Can you please also give examples for components for aggregators, appliers just like you already did for cache (Redis), KV (ZooKeeper), NO SQL DB (Cassandra)?

  • @prashantmehta5217
    @prashantmehta5217 6 лет назад +1

    How the trie is implemented in the discussed architechture. Is it in memory?? or some Database. which database supports trie?

  • @NachiketJoshiTheBloodMage
    @NachiketJoshiTheBloodMage 6 лет назад +3

    they asked me this question in one of the interviews,
    the emphasis was on,
    1. how will you store messages? data structure? I said tree of hashmaps.
    2. how will you manage scrolling up through the messages?
    3. how will you decide how much will be stored in users device and then on the cloud (if you want cloud)?
    4. how will you search in the past like messages from 2 years ago?
    all these questions are again unanswered after watching this video...
    can you please make a video or give a most correct answer?
    Thank you so much for all your videos...you have helped a lot of people.

    • @coolswatish
      @coolswatish 6 лет назад +1

      Shouldn't this be a comment on this video: ruclips.net/video/zKPNUMkwOJE/видео.html ?

    • @NachiketJoshiTheBloodMage
      @NachiketJoshiTheBloodMage 6 лет назад

      Yep, I was browsing many videos at a time and was too lazy to change the comment. I was also sure that I am not going to get any answer so did not bother changing at all...

  • @junhuagu3812
    @junhuagu3812 4 года назад

    You must live in Seattle:)

  • @gleventhal
    @gleventhal 5 лет назад

    I probably need to rewatch, but where does he address the client side communication that results in updates with each new keystroke? I assume it's AJAX or Websockets, but where did he discuss that?

  • @kevinhan9307
    @kevinhan9307 6 лет назад +11

    Hi Tushar, can you please do a video on "Median of 2 Sorted Arrays" from Leetcode, where they are of lengths m and n? This is the hardest problem on the leetcode portal

    • @tusharroy2525
      @tusharroy2525  6 лет назад +9

      Soon.

    • @wulymammoth
      @wulymammoth 5 лет назад

      Did you ever figure this problem out? It took me a while, but the Geeks for Geeks piece here helped me out on my path: www.geeksforgeeks.org/median-two-sorted-arrays-different-sizes-ologminn-m/

    • @anilanil166
      @anilanil166 4 года назад

      ruclips.net/video/LPFhl65R7ww/видео.html

  • @ppsanyal
    @ppsanyal 6 лет назад

    Tushar could you please post a video on the following design question " Design facebook live streaming". I have mostly found ideas and half-baked solutions elsewhere. Your videos are pretty comprehensive about how things should be explained in an interview so I would like to know your perspective about approaching this problem.

    • @MosheSiegel
      @MosheSiegel 6 лет назад

      I am just a beginner at system design, but I imagine the following points can be used for Facebook live streaming. 1) HTTPS so all livestreamed-data is secure. 2) Blob storage container such as AWS S3 to store video 3) CDN to optimize others watching the livestream. The CDN would need an algorithm to analyze the locations of the live-streamers audience. The algorithm would take into account number of audience members, their locations, the locations of CDN servers etc. to decide which CDN locations should get a copy. The algorithm could save its results in a database and use that data in the future instead of running the algorithm for each livestreamed video. When the CDN algorithm reruns itself to get new data will have to be based on another algorithm. 4) Load balancers, availability, consistency, etc. could be mentioned. If someone is watching the livestream on the other side of the globe, is it okay for their 'live' video to have a delay of several seconds? The answer to that depends on the business requirements. 5) Something needs to maintain the constant connection between the livestreamer's camera and the host server while the video is broadcasted. Web Sockets would be good for this. 6) The audience would need some way of receiving the video from the Blob storage container. I believe Web Sockets between the client and server would allow the audience to watch the video via a constant connection. But I do not know enough to say whether or not there are better solutions for this than web sockets.