30: LinkedIn Mutual Connection Search | Systems Design Interview Questions With Ex-Google SWE

Поделиться
HTML-код
  • Опубликовано: 6 окт 2024

Комментарии • 31

  • @AAASHIVAM
    @AAASHIVAM 12 часов назад

    MySQL is not designed to handle 250PB of data. I believe storing user information in a cache and using an LSM Tree + SSTable family of database will be better choice. We can join from cache. What do you think? Searches can be performed on the fetched user ids in parallel to filter out.

  • @ashishgaonker
    @ashishgaonker 3 месяца назад +7

    isn't count 500 X 500 = 250K ? which is 10 time more. Or are we assuming only 10% of friends will be mutual or something like that?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад +2

      Oof yeah good catch. Point is, fan-out probably won't work here.

  • @ichthyz
    @ichthyz 2 месяца назад +2

    When adding mutual connections from the Flink nodes. How is it known that the new mutual connections are not already direct connections?
    e.g. For 10: 3, 4, 15 you are creating 3,15 and 4,15. What if 3,15 and/or 4,15 are direct connections? These connections could also be on a different Flink node/partition.

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад +1

      Fair point - you can always just hit the database first here. We will have a connections table sharded by user Id so we know where to look.

    • @ishallwin24
      @ishallwin24 2 месяца назад +1

      Same doubt

  • @fanzhang5903
    @fanzhang5903 2 месяца назад +1

    Hi Jordan, loving this video. A couple of quick questions: 1. For the adding a connection workflow, is it supposed to be real-time processing or batch? 2. Let's say B accepted A's invite to connect and A wants to view the change right after it, how can we ensure that? 3. Does it make sense if we put the mutual connection data in memory cache servers and have a graph db to store the raw connections so that we can rebuild the cache if any node fails? Any idea or discussion is appreciated. Thanks!

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад +1

      1) Realtime
      2) You could first write to a table before using CDC to sink to Kafka and then see the first degree connection there

  • @SWEcodes
    @SWEcodes 3 месяца назад +1

    Awesome , great video🎉

  • @alphabeta644
    @alphabeta644 3 месяца назад +1

    Thanks for making this video Jordan. I have two questions: a) You mention "Mutual Cache table", but it appears you are using SQL db for that. Does not cache mean keeping in memory? b) It is mentioned that we need very fast reads ("fast as humanly possible"), should it not engender use of mongodb or something liek that instead of SQL db?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад

      Cache doesn't inherently mean memory, it just means having the result of a computation easily accessible. Why are mongoreads faster thansql?

  • @cattnation6257
    @cattnation6257 2 месяца назад +1

    Keep doing its help us out so much

  • @cattnation6257
    @cattnation6257 2 месяца назад +1

    You are great bro

  • @NBetweenStations
    @NBetweenStations 2 месяца назад +1

    How mutually awesome

  • @PoRBvG
    @PoRBvG 2 месяца назад +1

    Thanks for the great content again!
    Question: In your final diagram, the middle flow (new connection service) shows two layers of Kafka (a stateless consumer in between). Why do we need both layers? Can't the "new connection service(s)" directly push to the corresponding Kafka shard and avoid having the Kafka layer and the stateless consumer?

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад +1

      If we want the atomicity of both messages for a connection (a connecting with b and b connecting with a) we can basically either two phase commit to both of those kafka queues, or we can push to one kafka queue and handle any message replay on the back end.

    • @PoRBvG
      @PoRBvG 2 месяца назад +1

      @@jordanhasnolife5163 Makes sense. Won't the first Kafka layer and its stateless consumer be SPoF (especially the stateless consumer as Kafka supposed to be highly fault-tolerant)? and if we replicate them to avoid SPoF won't we have the need for 2PC? However, that's solvable if we use TMR (and not DMR).

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад +1

      @@PoRBvG I don't know what TMR and DMR is. Inevitably, you always need some amount of consensus or synchronous replication when dealing with attempting to make sure messages are durable. If you're willing to lose messages, you could very much have an asynchronously replicated kafka queue and consumers on each of those replicas of the queue.

    • @PoRBvG
      @PoRBvG 2 месяца назад +1

      @@jordanhasnolife5163 en.wikipedia.org/wiki/Dual_modular_redundancy. It helps with SPoF both the DMR and TMR

  • @truptijoshi2535
    @truptijoshi2535 2 месяца назад +1

    Does profile update mean updating the latest job or education? If yes, why do we need to update the mutual connection DB for that?

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад +1

      Yes - because the data is denormalized in our mutual connections database

  • @marcgentner1322
    @marcgentner1322 3 месяца назад +1

    I have a question on brokers and message queue.
    Do i setup the broker on a server and then set the consumers on other servers?
    Lets say i have a mail server and i need to classify the emails and send them after classification to there right system.
    Where do i host the broker and the Ai classification model?

    • @jordanhasnolife5163
      @jordanhasnolife5163  3 месяца назад +1

      I mean you can technically set them up wherever, but ideally different containers yeah

  • @MainDoodler
    @MainDoodler 3 месяца назад +1

    W as always

  • @JayGujarathi
    @JayGujarathi 2 месяца назад +1

    How to do full text search?
    Can you provide link to that video?

  • @huguesbouvier3821
    @huguesbouvier3821 3 месяца назад +1

    Thank you :)!

  • @lalasmith2137
    @lalasmith2137 3 месяца назад +1

    hey, i have some questions if anyone can please help me :)
    1) when jordan says shard the database by userID, it means shard it by the hash of the userID (for consistent hashing)?
    2) sometimes i see the term partitioned by instead of sharded by, are those the same?

    • @jordanhasnolife5163
      @jordanhasnolife5163  2 месяца назад +1

      1) yes
      2) I think so, others seem to disagree

    • @lalasmith2137
      @lalasmith2137 2 месяца назад +1

      @@jordanhasnolife5163 thank you so much for taking the time to answer :) also, can't thank you enough for all the knowledge i gained since finding your channel

  • @szyulian
    @szyulian 3 месяца назад +1

    Watched. --