System Design of GitHub Code Search - SDC Episode 1 with

Поделиться
HTML-код
  • Опубликовано: 12 июн 2024
  • Github code search allows developers to view and edit code online. This is particularly useful when debugging code in remote locations (like on vacation)!
    GitHub manages permissions, storage, and retrieval through a set of services. In this video, we look at systems that power its search APIs.
    If you have any doubts or suggestions, please share them in the comments below.
    This is the first episode of the System Design Charcha series. Subscribe for notifications and updates!
    00:00 Problem Statement
    01:55 Capacity Estimations
    02:52 Brute Force Approach
    04:00 High Level Architecture
    06:30 API calls
    11:04 Form of Response Object?
    16:10 API flow
    17:40 Search Engine
    31:10 Summary
    33:10 Peek under the hood
    36:14 Final thoughts
    37:00 Thank you!
    References:
    Numbers every programmer should know: gist.github.com/jboner/2841832
    Github statistics: github.blog/2023-01-25-100-mi...
    Useful Resources:
    InterviewReady: interviewready.io/
    Designing Data-Intensive Applications Book: amzn.to/3SyNAOy
    Social Links:
    Github: github.com/InterviewReady/sys...
    LinkedIn: / interview-ready
    Twitter: / gkcs_
    #SystemDesign #InterviewReady #Coding

Комментарии • 19

  • @KeertiPurswani
    @KeertiPurswani 2 месяца назад +9

    Excited about the series!

    • @ritwikachakroborty6973
      @ritwikachakroborty6973 2 месяца назад +1

      Mam, will we have a separate trie starting with each alphabet between a-z for an org. How are we deciding the starting letter as we have so many words

  • @mikestaub
    @mikestaub 2 месяца назад +2

    The discussion at 16:10 was basically the core issue with all backend systems. Extremely useful discussion.

  • @venkateshr1193
    @venkateshr1193 2 месяца назад +2

    I would have liked to see a bit more of detail about
    1. How do we deal with concurrent reads and writes on the trie.
    2. How do we partition the trie?
    3. If it is an inmemory trie, what are the memory requirements and how do we rebuild the trie during pod failure?

  • @stym06
    @stym06 2 месяца назад +4

    Shouldn't the architecture be more practical and detailed instead of it being theoretical? You've discussed using Tries, but how do you handle the distributed reads and writes to it? Wouldn't ElasticSearch be a better way? The primary thing to do here is document search. Why involve theoretical data structures instead of actual projects utilizing those data structures that are actually being used in the industry. Like, Text search is almost always done using ES in most companies

  • @swanv951
    @swanv951 2 месяца назад +2

    Could we have used ElasticSearch instead of managing Trie ourselves? Would ElasticSearch be able to update the index efficiently when the underlying document (code file in this case) changes?

  • @saiteja2993
    @saiteja2993 2 месяца назад +2

    We can have another trie pre processing at the file level such that if any deletions happen we can delete that particular file related trie and generate a new one eliminating going through entire trie of the repo.

    • @gkcs
      @gkcs  2 месяца назад

      That's an interesting idea!

  • @sudeepchoudhary5467
    @sudeepchoudhary5467 2 месяца назад

    In search engine can we use something like Lucian index which use inverted index .

  • @peeyushyadav4991
    @peeyushyadav4991 2 месяца назад

    is it like they(github) probably implemented something equivalent of a Razor View for returning the HTML response back?

  • @sritejaparimi6605
    @sritejaparimi6605 2 месяца назад

    Great video!! Does this also consider a string in a big word? Like there is a string "include" in a file but I am only searching "clude", would we get results? Doesn't seem like it

    • @gkcs
      @gkcs  2 месяца назад

      It would be possible with tries storing the reverse words.
      There are also suffic tries. At that point, it's better to use a known solution like Elastic Search, which internally uses these algorithms.

  • @piyushpathak1186
    @piyushpathak1186 2 месяца назад +1

    can we get gaurav sen + arpit bhayani collab someday

  • @ankitbhandari7608
    @ankitbhandari7608 2 месяца назад

    Don't you think you could have used a ready made soluting for text indexing like elastic search instead of using trie

  • @piyush814
    @piyush814 2 месяца назад

    What is the tool he is using to draw flowcharts?

  • @sunilsurana4767
    @sunilsurana4767 2 месяца назад

    WIth trie you will be able search only full words. Github can search even if your search start from mid letter of work. It can also search combined 2 words. All these functionalities cannot be supported by trie

    • @gkcs
      @gkcs  2 месяца назад

      Have a look at suffix tries. Explaining the algorithm in a system design interview wouldn't be feasible, but its an interesting real-life implementation.

    • @sunilsurana4767
      @sunilsurana4767 2 месяца назад

      @@gkcs Understood. Thanks for video. Very informative