How Google searches one document among Billions of documents quickly?

Поделиться
HTML-код
  • Опубликовано: 28 июн 2024
  • Understand the data structures and internal process of how search engines index the documents and made them easy to search
    #searchenginedesign #searchsystemdatastructure #designsearchengine #invertedindexes
    #systemdesigntips #systemdesign #computerscience #learnsystemdesign #interviewpreperation #amazoninterview #googleinterview #uberinterview #micrsoftinterview

Комментарии • 141

  • @shahidbits
    @shahidbits 5 лет назад +5

    Great article. Keep these coming, Sir :)

  • @goutamkreddy
    @goutamkreddy 3 года назад

    Nice work with laying out the basic building blocks if any search engine! (especially, the inverted index section)

  • @Akashkumar-md6rg
    @Akashkumar-md6rg 4 года назад +3

    I can't find such a content anywhere on RUclips. These videos shows the hard work behind it. U r really great.
    Thnq very much..🙌🙌

  • @ganeshmain009
    @ganeshmain009 5 лет назад +3

    Great content on this channel bro..keep up the good work

  • @jheelparikh5365
    @jheelparikh5365 4 года назад +6

    You are excellent at explaining System Design. Watching your videos makes me motivated to learn in depth and my learning experience interesting. There is so much to learn from your videos also shows that you enjoy teaching and sharing your immense knowledge.

  • @rohanvardhan2767
    @rohanvardhan2767 5 лет назад +16

    Your channel is one stop shop for SDI prep. Thank you for another great video. Could you please share system design for YELP like service.
    Cheers!

  • @Kasatankit
    @Kasatankit 3 года назад

    Bhai you are awesome!!! I cleared so many interviews because of you!! Sending you more power and may all your wishes come true!

  • @iitgupta2010
    @iitgupta2010 5 лет назад +11

    Grt, its good knowledge on Inverted index search. That' cool.
    But was expecting bit more on Actual design. System diagram flow etc.

  • @piotrszkiecin8357
    @piotrszkiecin8357 4 года назад

    Your videos are great!

  • @akashjkhamkar
    @akashjkhamkar Год назад

    really cool video man, keep such videos coming !

  • @ArvindChourasia
    @ArvindChourasia Год назад

    What an amazing simplified explanation of such a big concept. Thanks for sharing your knowledge.

  • @csie123
    @csie123 4 года назад +27

    11:00
    18:17 noise removal
    26:35 indexMap[keyWord]=[ { 哪個sentenseIdx , 字串中出現的pos} , ]
    31:20 Conjunction=And / DisConjunction OR 31:25 remove duplicate
    34:35 Union三連字要看順序

  • @raviprakashagrawal9478
    @raviprakashagrawal9478 5 лет назад +16

    This guy has done so much hard work to provide free education. Still, a few people disliked this video. Shame on them.

  • @MrQuadraaa
    @MrQuadraaa 3 года назад

    Top content. Thank you!

  • @harishwarreddy9114
    @harishwarreddy9114 3 года назад

    While watching the video i can observe increase in my knowledge .What an explanation!

  • @yzmashuai
    @yzmashuai 4 года назад

    Thx Narendra for this wonderful eposide

  • @optimizer_____2420
    @optimizer_____2420 4 года назад

    Very nice explanation sir 👍. Thank you very much.

  • @lalithak6323
    @lalithak6323 Год назад

    Can't explain in words how much useful is this video, thanks a lot!!!

  • @user-uk1md9zs2p
    @user-uk1md9zs2p 8 месяцев назад

    Thank you for sharing! Very clear and learner centered. What an amazing educator!

  • @vrushtijoshi92
    @vrushtijoshi92 4 года назад +1

    This was really helpful. Thank you so much. :)

  • @prabhatkumarsahu3115
    @prabhatkumarsahu3115 4 года назад

    Great super explanation...

  • @phamquangvi4413
    @phamquangvi4413 Год назад

    Thank you for your interesting video. Thank a lot.

  • @mohamedamr9203
    @mohamedamr9203 4 года назад

    thank you its a great video

  • @adithyaks8584
    @adithyaks8584 3 года назад

    Very good channel I am enjoying every bit of learning here !

  • @veenuvinod1
    @veenuvinod1 Год назад

    Bro , you are doing such a great work that too free of cost.

  • @ashish161087
    @ashish161087 3 года назад

    Wow..You are great..

  • @waelalghazouli8024
    @waelalghazouli8024 3 года назад

    Very helpful, thank you very much!

  • @bharathpreetham2840
    @bharathpreetham2840 5 лет назад +1

    nice explanation😊

  • @user-oy4kf5wr8l
    @user-oy4kf5wr8l 2 года назад

    amazing.........just amazing......thank u narendra!!!

  • @ricardob.18
    @ricardob.18 Год назад +1

    Amazing video!

  • @ankitjain8255
    @ankitjain8255 3 года назад

    Such an amazing video, thank you so much Narendra.

  • @kumarc4853
    @kumarc4853 3 года назад

    Narendra Sir is a Legend RUclipsr in Sys Design. Very Thorough and knowledgeful videos

  • @roshankumar0911
    @roshankumar0911 4 года назад

    Awesome explanation :-)

  • @anastasianaumko923
    @anastasianaumko923 Год назад

    Very clear! Thank you so much for your work 😌

  • @bvsivakrishna
    @bvsivakrishna Год назад +1

    "Basically" I got your point brother! 😂

  • @pravaskumar7078
    @pravaskumar7078 5 лет назад

    awesome ....

  • @brijeshshirodkar8784
    @brijeshshirodkar8784 Месяц назад

    really enjoyed the illustration of Indexing

  • @w.maximilliandejohnsonbour725
    @w.maximilliandejohnsonbour725 4 года назад

    Nice info...!!!!!.

  • @NitishSarin
    @NitishSarin 5 лет назад +8

    You sir, are a legend! 🙏🏻

  • @anchaldubey4217
    @anchaldubey4217 4 года назад

    Very nice explanation sir

  • @abhaytiwari6411
    @abhaytiwari6411 4 года назад

    bhaut acha guru

  • @priyaravindran4337
    @priyaravindran4337 3 года назад

    Awesome 👏

  • @sanjeebkumargouda1471
    @sanjeebkumargouda1471 3 года назад

    Great Lecture sir !!!
    I enjoyed listening and learning😇😇

  • @524emon
    @524emon 3 года назад

    Great content ... great explanation ... it is difficult to find such content on youtube.

  • @aipaperreader
    @aipaperreader 5 лет назад +38

    So basically, it is a great video!

  • @faisalmorensya4936
    @faisalmorensya4936 4 года назад

    great video man!

  • @MoonsuKang
    @MoonsuKang 5 лет назад +1

    Great contents. Thanks for your video.

  • @AP-tz1ns
    @AP-tz1ns Год назад

    really good video

  • @prasadjayanti
    @prasadjayanti 2 года назад

    good explaining ...

  • @adamhughes9938
    @adamhughes9938 3 года назад

    So great

  • @vijaykumar-yq7sf
    @vijaykumar-yq7sf 5 лет назад +1

    Great

  • @theultimaterelaxation6839
    @theultimaterelaxation6839 3 года назад

    Thank you so much.

  • @rahulsrivastava1603
    @rahulsrivastava1603 Год назад

    You earned a sub Sir

  • @vivekrautela6928
    @vivekrautela6928 3 года назад +28

    The b tree indexing has time complexity of O(logn) as it performs binary search, why you told that it takes O(1)?

    • @cool1000nitin
      @cool1000nitin 3 года назад +7

      Correct. What was explained is caching not indexing but overall video is very helpful.
      👍

  • @rahulsadanandan5076
    @rahulsadanandan5076 3 года назад +13

    Once that "basically" sets in,it's hard to get over

    • @RanjuRao
      @RanjuRao 2 года назад

      I literally counted how many times "basically" is used. I appreciate the fact that people like him are coming out of their way to contribute to society ,but it would have been more of a sounding content iff this was planned and represented even more methodologically.

  • @vamshiabhilash
    @vamshiabhilash 4 года назад +1

    this is really informative and thank you for the kn0wledge this will really be helpful for me as I'm into digital marketing Thank you do much sir really loved the way you explained

  • @janabodu3392
    @janabodu3392 4 года назад +7

    Can you please increase the volume, its hard to listen from my laptop.. Noticed same thing with your previous videos as well.. But Nice and informative videos..

  • @raselahmedb
    @raselahmedb 4 года назад

    Best explanation.
    Please upload POS system.

  • @chiragkataria4161
    @chiragkataria4161 3 года назад

    Great content man, well explained, only suggestion is to work on audio quality, even after max volume its not properly audible

  • @kslsantosh
    @kslsantosh 5 лет назад +2

    Nice explanation :) , one small suggestion where the order of the words matter is for example query: "Distance between Mumbai to Delhi". This video helped me to brush up information retrieval techniques.

    • @cliffmathew
      @cliffmathew Год назад

      When you type "Distance between Mumbai to Delhi", I guess the expectation is to get a number ("26 hr (1,419.7 km) via NH 48") as an answer, and not the documents containing those words. Essentially, we would like the search engine to understand the meaning of our question, and then respond with a factual answer. I suspect that is handled by different techniques like "Semantic search" or "extractive question answering".

  • @madhu9829
    @madhu9829 5 лет назад +37

    How do you gather such in-depth details of these systems? Really great work and I'm enjoying it.

    • @TechDummiesNarendraL
      @TechDummiesNarendraL  5 лет назад +19

      You can too, keep reading :)

    • @mmanuel6874
      @mmanuel6874 3 года назад

      @@TechDummiesNarendraL what books and resources do you recommend?

    • @nabanitasen
      @nabanitasen 3 года назад +4

      @@mmanuel6874 Best place is the papers published by these companies and their tech talks.

    • @techtea5911
      @techtea5911 2 года назад

      @@mmanuel6874 Google!

  • @swaroopjin
    @swaroopjin 3 года назад +1

    Its a great video Naren.. but would like to understand how does refactor/removal of index/keyword happens, also does the Google store index across multiple regions eg: APAC, AMERICA... How big the index can be incase of the Google

  • @dataguy7013
    @dataguy7013 4 года назад

    @NArendra, thanks for the detailed explanation. Nice work. So if there ia a new document with the word borwn, how do you update the index? IS updating the index more expensive? Can you do incremental load of the index?

  • @rajparekh08
    @rajparekh08 3 года назад

    Thanks for taking time to make this video. Basically 😂 explained very well .

  • @poonamgoel8993
    @poonamgoel8993 4 года назад +1

    Very much informative, thanks for posting this video. One question: is this how all search engine works eg: Windows File System Search, outlook/gmail search, web search?

  • @saurabhtyagi6963
    @saurabhtyagi6963 2 года назад

    Make bit indexing between words and docs and then it will be easy to do conjuction and disjunction

  • @sujataroychowdhury178
    @sujataroychowdhury178 4 года назад

    Great Video for explaining the nitty gritty details of inverted search. If you at least show the full flow of the design, would be much appreciated .

  • @adamhughes9938
    @adamhughes9938 3 года назад +1

    If you put a donation link on your videos I would send you money at this point, your vids have helped me so thouroughly!

  • @mityabor
    @mityabor 4 года назад +1

    Nice explanation of TF-IDF preprocessing. But not clear how ranking and ML are organized

  • @rahulsoni-lx5rb
    @rahulsoni-lx5rb 5 месяцев назад

    🤩🤩🤩

  • @kambalavijay6800
    @kambalavijay6800 2 года назад +1

    @Tech Dummies Narendra L, it's ok to build a matrix for 3 documents. How about a case where there are a billion docs to scan through and build an inverted index matrix. The matrix would be huge right?

  • @shivampradhan6101
    @shivampradhan6101 Месяц назад

    can you make a new video, since i think searching has moved quiet ahead from tf idf to gen ai embeddings and rag methods

  • @shubhambansal5487
    @shubhambansal5487 5 лет назад +1

    Great job!!
    One important suggestion. Use external microphone to cut off background noise and better sound quality.
    Otherwise your Hard work would not be paid off.

  • @rahulsharma5030
    @rahulsharma5030 3 года назад

    Awesome stuff.One doubt:
    in last minutes prefix search, i think if we store keys by sorted order.it will make insertion complex,search through the table using binary search and then insert.Btrees/trie can be used but they are not as efficient to search as hash.what we can do is to search jum(a) jum(b) and so on upto z.Since there are 26 apphas and limited number of character apart from alpha.It is more efficient than maintaining sorted list.

    • @realgabreal
      @realgabreal Год назад

      jum* doesnt tell us how long the word would be so just searching for jum(a), jum(b) and so on would not work I think

  • @AmanKumar-ey2eu
    @AmanKumar-ey2eu 4 года назад

    can we have a video on dream11 architecture

  • @20frieza
    @20frieza 4 года назад

    Great Video again Narendra! I am not sure what is the purpose of frequency in this table. It looks like we will never use it when querying. Also why cant the data structure to store the index be Map. This way you can have very fast lookups. But I agree you cant do regex type matching in it.

    • @cliffmathew
      @cliffmathew Год назад +1

      I suspect the frequency is used to understand the semantics. A word and words that appear together or frequently around it are all useful information to gauge the relevance of a document.

  • @v.karikaran5973
    @v.karikaran5973 3 года назад

    how you learn this thing can you please suggest any book to refer,
    how search engine work how to build it

  • @chaosu2755
    @chaosu2755 3 года назад +1

    If there are millions documents have these words. You cannot run intersection in memory. That is one critical part we need focus on. How to make it work?

  • @rahulsharma5030
    @rahulsharma5030 3 года назад

    how can we update this table when new stuff is added?Shall we store it in redis/cache to speed up?

  • @chitthiaayeehai
    @chitthiaayeehai 5 лет назад +1

    Noise removal got slipped ... But it's ok I figured it out... Nice lecture dude.

  • @panprasanta
    @panprasanta 3 года назад

    can you shed some light on how intersections are performed when you have billion of docs ?.

  • @kumarch4027
    @kumarch4027 2 года назад

    How the millions of queries will be searched over Reverse index . How the concurrency achieved ? Union and intersection where it is happening and how quick it is . Can query go to the exact shards based on query words and perform union ?

  • @AbhishekSharma-bg7ez
    @AbhishekSharma-bg7ez 3 года назад

    Thanks for the great video and explanation. There is an issue in the term document table. Word "Quick" appears twice in that table at row0 (as Quick) and row8 (as quick) but has different values. this is bit confusing as I am not sure if I am missing something.

    • @atulsinghrajput9932
      @atulsinghrajput9932 2 года назад

      yes , you can to toLowerCase() , its was a mistake. you can ignored that

  • @somerandomguy000
    @somerandomguy000 2 года назад

    Basically this is a video on how to use basically in all sentences you say

  • @sagartyagi2450
    @sagartyagi2450 3 года назад

    How does fuzzy search works? What if I write Quock instead of Quick? How does that comes as a result?

  • @esakkisundar
    @esakkisundar 2 года назад

    Out of topic, did you ever wondered the information retrieval from your brain. You can bring back the incident/emotions/moments in milliseconds and the brain is not power intensive at all. What an amazing design our brain has.

    • @JasonBechervaise
      @JasonBechervaise 2 года назад

      The brain operates at about 20 watts; this is about 30% more than a Raspberry Pi 4B at theoretical consumption (5V @ 3A = 15w)

  • @GopalRoy-nn6ft
    @GopalRoy-nn6ft 4 года назад +2

    I don't think u have included technical concept. And technology they use.

  • @balakrish3387
    @balakrish3387 4 года назад

    Great content... Voice not so clear

  • @lokesh4585
    @lokesh4585 2 года назад

    What if we search for "Jumped", Do we expect search engine to lemmatize search keyword?

  • @eugnsp
    @eugnsp 4 года назад +2

    Suppose that words "quick", "brown" and "fox" are associated with millions of documents. How would you find an intersection of them in a fraction of a second?

    • @ers-br
      @ers-br Год назад

      Just compare the 'bits' ...

  • @alekseilitvinau
    @alekseilitvinau 2 года назад

    "basically" it is nice tutorial xD

  • @keshavyadav
    @keshavyadav 2 года назад

    Please improve the audio quality 👏

  • @sayanroy9161
    @sayanroy9161 3 года назад

    basically

  • @stevemew6955
    @stevemew6955 3 года назад

    Great video! A bit of trivia - "the quick brown fox jumped over the lazy dog" should have an 's' in it -> "the quick brown fox jumps over the lazy dog". It's a typing exercise to hit all 26 letters. en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog. ;)

  • @sushantdev4997
    @sushantdev4997 Год назад

    can't we use a trie here?

  • @arunachalamk1145
    @arunachalamk1145 4 года назад

    Can u make video about networking deeply

  • @EricEric2004
    @EricEric2004 4 года назад +1

    Basically, the basicality of the basic is quite basical.

  • @ThugLifeModafocah
    @ThugLifeModafocah 2 года назад

    You basically like a lot to say basically hun? LOL. Thank you for the content, keep it up.

  • @preety202
    @preety202 4 года назад

    You need better lighting, it's too dark

  • @sandeepbatchu487
    @sandeepbatchu487 5 лет назад +1

    What is the point of storing frequencies?

    • @carrotcarrorfarm
      @carrotcarrorfarm 4 года назад

      it is relevant the search result rank. tf-idf, bm25 concept will help you