System Design : Distributed Database System Key Value Store

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 190

  • @geekyprogrammer4831
    @geekyprogrammer4831 2 года назад +22

    Where you disappeared man? Your content is gold mine!

    • @madhurgautam427
      @madhurgautam427 5 месяцев назад +2

      Bro, he worked in temporal technology from 2022 before this he worked in apple for 7 years... ❤

  • @avinashyadav1
    @avinashyadav1 6 лет назад +56

    Glad to see you back. Whenever I remember DP your name automatically comes to my mind :)

  • @shoebmoin10
    @shoebmoin10 6 лет назад +6

    You are the best DS/ALGO teacher, you helped me a lot through my programming journey.

  • @socialkeviv5444
    @socialkeviv5444 4 года назад +2

    I read a System design book, and got all confused. After watching this video most of my confusion was cleared. Thank you Tushar :)

  • @wdl9108
    @wdl9108 4 года назад +2

    Hello,you are the clearest and best person I have ever seen to explain the algorithm. Thank you very much!

  • @gkcs
    @gkcs 6 лет назад +42

    Nice video Tushar!
    It seems to me that when we choose consistency over availability, we horizontally partition the data with replication groups for fail overs. When going for high availability, replication across nodes with a global consensus helps.
    Btw 38:50, I think it's 1000 replication groups for 5 PetaByte, instead of the 100 mentioned :)

    • @tusharroy2525
      @tusharroy2525  6 лет назад +9

      Yes I misspoke 100 instead of 1000. Did write 1000 on board though. Thanks for pointing out.

    • @tusharroy2525
      @tusharroy2525  6 лет назад +2

      Not sure what you mean by global consensus. That would be fairly expensive operation. Are you referring to Cassandra model?

    • @gkcs
      @gkcs 6 лет назад

      Yes, I meant the quorum check in Cassandra.

    • @ankitvaidya
      @ankitvaidya 6 лет назад +40

      Two awesome youtubers together in a comment. It made my day :)

    • @Sandeepg255
      @Sandeepg255 3 года назад +1

      Hbase vs Cassandra

  • @soumyasengupta3567
    @soumyasengupta3567 6 лет назад +2

    Absolutely loved how you demonstrated the importance of Consistency through the Replication Group design in and out.

  • @himanshurai485
    @himanshurai485 2 года назад +5

    Nice video Gautam Gambhir 👍

  • @geeksclub8433
    @geeksclub8433 5 лет назад +7

    Kudos for the explanation. I recently got a call from google and they sent me the interview preparation material of which one points to your channel :)

    • @chionn8764
      @chionn8764 3 года назад

      Can you please share the material to me? sese.dev.k@gmail.com

  • @meganlee5897
    @meganlee5897 6 лет назад +5

    Never got enough of your videos! Thanks for the high-quality giving backs!

  • @princesharma1202
    @princesharma1202 4 года назад +5

    Please make more videos. Your explanation is really good. I want to land myself into product based company (FAANG). PLEASE SHARE YOUR KNOWLEDGE AND EXPERIENCES.

  • @hlibpylypets1333
    @hlibpylypets1333 3 года назад

    This video is the most detailed explanation I have seen of the distributed cache system design. Awesome

  • @aizad786iqbal
    @aizad786iqbal 3 месяца назад +1

    kaha gaya usse dhundho...
    u should come back...
    views would sky rocket and it'll really help a lof of people ....
    your vides are clear precise and exactly too the point...

  • @kumarc4853
    @kumarc4853 3 года назад

    Thank you Master, excellent video. I was actually asked to design a Distributed DB recently in an interview.

  • @kshitijgoel9007
    @kshitijgoel9007 4 года назад +1

    Please keep making videos .. they are very useful .. thanks

  • @AmanNidhi
    @AmanNidhi 6 лет назад +4

    you should have more subs. You have the best explanation for every topic.

  • @annawilson3824
    @annawilson3824 3 года назад +1

    Who is here at the end of 2021? Tushar still rocks!

    • @mysticlunala8020
      @mysticlunala8020 3 года назад

      Why tf do you care about a few internet points/likes that is absolutely useless and utter bs which might give you a moment of satisfaction and then you'll be back to reality where everything is falling apart and you're trying your best to keep it all together?

  • @DAaaMan64
    @DAaaMan64 6 лет назад

    Glad to see your back. Your videos got me through some bad times between jobs. Thanks Tushar.

  • @kapoorvinny1612
    @kapoorvinny1612 4 года назад +1

    very detailed and focus on key concepts. Well done Tushar I can clearly see the amount of efforts you are putting it so people can understands such concepts easily

  • @shredder_plays
    @shredder_plays 6 лет назад +4

    Nice to see you back sir after a long time please do make some more videos on DYNAMIC PROGRAMMING AND GRAPHS problems sir

  • @manishasinha6694
    @manishasinha6694 5 лет назад +1

    Thank you so much Tushar :-) Was just looking for some content on this everywhere and got the most informative and helpful video here .

  • @anandbaraik5010
    @anandbaraik5010 6 лет назад +2

    Thanks a ton bro. Your teaching style is just awesome. Learned a lot. Pls made video on DS too.

    • @raghua450
      @raghua450 6 лет назад +3

      He has done many videos on data structures on trees and graphs very nice very nicely explained. Check in his channel there are many videos.

    • @anandbaraik5010
      @anandbaraik5010 6 лет назад +1

      Raghu Gr ok sure thanks

  • @DeepakGupta-yv8ft
    @DeepakGupta-yv8ft 4 года назад +2

    the man
    the myth
    the legend

  • @balakrishnan3725
    @balakrishnan3725 3 года назад +6

    Nice video Tushar with lot of info!
    I have 2 questions on sequencer generation. I understand that the node which you referred here is the Request Manager(RM) node. The sequence generation will be done as a first step, soon after receiving the request in RM node.
    Question 1 - How the sequence will be consistent, when it gets generated from different nodes? Because if we use data like nodeId, node's sequence then it wont be consistent. Consistent meaning the same sequence should be generated always for a request from any RM node. May be I am missing some thing, please correct the understanding.
    Question 2 - If we use timestamp then it is expected that all the nodes in RG should be with exact same timestamp. This is again a challenge.
    So, we need to go with a distributed sequence generating application which can be used by all the RG system. This application can generate numbers in sequential order without using nodeId or timestamp. Your thoughts?

  • @20frieza
    @20frieza 5 лет назад +2

    This is plain brilliant !!! Thanks a ton for this Tushar. I hope you come out with more stuff like this!

  • @bmujeeb
    @bmujeeb 4 года назад

    Well done Tushar. Well explained.

  • @pinkylover911
    @pinkylover911 2 года назад +2

    great content

  • @enkhboldochirbat3578
    @enkhboldochirbat3578 3 года назад

    Tushar is great teacher.

  • @guoqiaoli3294
    @guoqiaoli3294 5 лет назад +3

    Should it be better to name it partition group instead of replication group?

  • @oooRaGe
    @oooRaGe 3 года назад +2

    You sort if alluded to using zookeeper as an underlying mechanism for implementing the metadata manager. But zookeeper itself is a KV store, of the sort this design is trying to implement - Why doesn't zookeeper satisfy the need of this design? What additional capabilities does this design provide over zookeeper? How would you implement metadata manager without using an existing KV store like zookeeper?

  • @vaybhavshawify
    @vaybhavshawify 6 лет назад

    Does not get any better..!! Kudos!

  • @finnrichardson9022
    @finnrichardson9022 6 лет назад +1

    great video Tushar, keep up the good work!

  • @PiyushGuptaknl
    @PiyushGuptaknl 6 лет назад +1

    Nicely explained Tushar! Please share some resources from where you gathered/studied this information.

  • @tyler_ua6593
    @tyler_ua6593 6 лет назад +3

    Tushar is back!!!

  • @AnkitMalhotra
    @AnkitMalhotra 5 лет назад +4

    Thanks for all the insights. I have a question: How do you plan to handle the situation when the controller node goes down? Is that replicated too?

  • @akshaysuman8168
    @akshaysuman8168 6 лет назад

    Glad to see your video dude. Please continue making more

  • @raghua450
    @raghua450 6 лет назад +2

    Hi Tushor Roy
    This is Raghavendra.
    Let me tell you first thing is i am follower of your videos.
    Your videos are giving me lot of help in understanding the
    Data Structures and which is helping me in solving the Problem solving questions
    and also improve my data structures knowledge.
    I am writing this comment to know how you prepared for all the concepts you
    have done the videos. I mean to say how you learn what books you read or how
    you are able to learn so nicely.
    Reason for asking this question is how can i improve on self reading and updating
    my self like you do in order to master the concepts like you.
    Please make a video on that about your practices on improving the skills and also
    mastering them.
    Sincerely waiting for you reply.
    Regards,
    Raghavendra

    • @tusharroy2525
      @tusharroy2525  6 лет назад +4

      These are good questions. Lot I learn from my work experience and by reading stuff online and then connecting dots. I wish there was one book to rule them all but its not there. You will have to watch videos from various people and read stuff to improve your knowledge.

    • @raghua450
      @raghua450 6 лет назад

      Thanks for the reply Tushor
      Why i asked this question to you because only you explained the Red Black Tree so nicely.
      I read many blogs in internet before watching your video. Because of your video i have teached many about Red Black Tree.
      You are the reason i started learning from youtube.
      Now i am doing soving probelms on Data Structures in Hacker Rank from three months.
      Then i was thinking how you learned about Red Black Tree since lot of stuff available in internet was not complete
      or i was finding difficult to understand it completely.
      Then i was thinking you must be having really good plan to learn new things which you don't know.
      That is the reason i asked you this question.
      No need reply if you wanna say the same thing the replied comment.
      Regards,
      Raghavendra

    • @tusharroy2525
      @tusharroy2525  6 лет назад +2

      Red black tree I learned from Wikipedia.

  • @souviksen9152
    @souviksen9152 6 лет назад +1

    Awesome video. Please make a system design video on a voice-activated system like Alexa or Siri and Airbnb.

  • @praticksharma4820
    @praticksharma4820 4 года назад

    Most underrated channel. I wish i could contribute in your channel. Keep up the good work.!

  • @jonahren2649
    @jonahren2649 5 лет назад

    one of the best system design videos, great thanks!

  • @yawei8845
    @yawei8845 5 лет назад

    Tushar is back! Looks like you've gained some weights.

  • @mingl2178
    @mingl2178 2 года назад

    I recommend emphasizing that each replication can handle "f" number of "crash failures" instead of simply "bad nodes". The reason is that "bad nodes" includes byzantine nodes, which cannot be handled in this setup.

    • @mingl2178
      @mingl2178 2 года назад

      Very good video overall

  • @ravhaak
    @ravhaak 5 лет назад

    Miles to go before you sleep.
    Could you please prepare system design and LLD for the following:
    1. Simulation of a cricket match, football match etc.
    2. Implementation of Queue like Kafka
    3. Ecommerce price drop notification system for 50M products
    4. Amazon like website and order management system i.e. everything that happens after clicking checkout
    5. Elevator system
    6. Scrabble
    7. Chess game
    8. A library for evaluation of expression

  • @AolaDIY
    @AolaDIY 4 года назад

    Google is coming with website for coronavirus crisis! Please please do system design of that! It will be super hot topic I am predicting!

  • @abhishes
    @abhishes 6 лет назад

    Good to see you back.

  • @noblessc
    @noblessc 5 лет назад +1

    Hi Tushar thanks a ton for videos. I have a question w.r.t to Sequencer. It is not clear to me who is Sequencer generator (Request Manager or RG Leader ?). If Request Manager, then two PUT request for same record/key land-up on two different Request Manager would get different Sequencer and using Sequencer from two different Request Manager can not be used for logical ordering. In such cases which PUT should win ?? Please correct me if I misinterpret it.

    • @guangyang5116
      @guangyang5116 5 лет назад

      Yes, I agree. I think storing them somewhere in RG leader makes more sense.

  • @Cosciug1234
    @Cosciug1234 6 лет назад

    awesome video! i'm glad you're making more system design videos

  • @SahilThakur26
    @SahilThakur26 3 года назад +1

    How is sequencer resolving conflicts if two puts come at the same time? I think it would still create conflict as timestamp+other sequencer params would be same.

  • @SusilRamarao
    @SusilRamarao 5 лет назад +1

    Hi Tushar, I am able to understand and apply the algorithms from your DP videos by my own and works very well.. I need tips on how to improve it more.. Thank you :-)

  • @83rossb
    @83rossb 6 лет назад

    Tushar you're gonna break tech interviewing.

    • @tusharroy2525
      @tusharroy2525  6 лет назад

      Lol. ?

    • @83rossb
      @83rossb 6 лет назад

      Systems design interview questions were previously only answerable by people with actual experience, now after watching your videos anyone with a youtube account can answer these and waltz right into any tech company. Hey I have a video request. Can you do a video talking about what you do for fun outside of coding.

  • @SiddharthKulkarniN
    @SiddharthKulkarniN 6 лет назад

    Welcome back dude. Great vid.

  • @nitinvarun7700
    @nitinvarun7700 4 года назад +2

    Hello Sir,
    Very good overview explanation, it will be more helpful if you please also mention the links in description from which we can further study the topics.

  • @stackunderflow5951
    @stackunderflow5951 2 года назад

    I think when a new leader is elected, the other nodes can get from the metadata service that there is a new leader and it's not necessary for the new leader to talk to others?

  • @pradipnitw
    @pradipnitw 6 лет назад +1

    loved this video. thanks for uploading.

  • @tzu-lingkan8199
    @tzu-lingkan8199 4 года назад +1

    Looking forward to your new videos!! You've been gone so long~

  • @ashutoshmishra2328
    @ashutoshmishra2328 4 года назад

    Hi Tushar, thanks for the great content.
    One Question,
    How it is going to store records with multiple columns, in Stocks table we just have Name(key) and Price(value) but if we have a user table having user id (which can be considered as key), First Name, Last Name, Email etc. then how the values will be stored.?
    In wide-column and document based Databases we can store them as JSON but in key-value database how its gonna be.?

  • @Venkat2811
    @Venkat2811 6 лет назад +1

    Excellent Video ! Considering your experience in S3 team, did you get any chance to work on any of these components ?

  • @bephrem
    @bephrem 6 лет назад

    Tushar. There is a HUGE need for more videos like yours. If you made a course that had ALL of the LeetCode concepts explained (DP, Backtracking, Hash Tables, Divide and Conquer, Arrays, Strings, Stack, Graphs, DFS, BFS, Binary Search, Trees, etc.) with problems and it was a high-quality series going over 100-200 problems or so, me, as a college student about to interview for engineering jobs would personally buy that for a good price. I think this is a huge opportunity and gap in this space.
    People who can explain these concepts, the questions, and their solutions eloquently and in a way that makes sense. For example, your tree videos really set a foundation for me to see recursion the right way although I've done tree projects in and outside of class, it never clicked until I saw your series.
    Normally if I can't get a LeetCode problem, when I can't find the solution I always find videos in Chinese that don't even explain their solution that well...it is a problem...and an opportunity for you.
    Just a thought.

  • @adamtran5747
    @adamtran5747 Год назад +1

    love you Tushar

  • @smartello87
    @smartello87 4 года назад

    What if a controller fails? There's no reservation and it ensures reliability of other nodes, which makes it a single point of failure. Something like an active pair would be a solution here, I guess?

  • @alexmiragall9775
    @alexmiragall9775 6 лет назад +4

    Hi Tustard, nice video! I have a doubt, to create the sequence number, you are using a timestamp to make sure that a later stored value has a bigger sequence value. Where do you generate this timestamp? What if clocks from different machines are not synchronized? Thank you very much for your videos.

  • @vburlai
    @vburlai 5 лет назад

    Hi Tushar, thank you for the videos.

  • @deathbombs
    @deathbombs 2 года назад

    is having a controller for handling splits automatically a good idea? what if the algorithm for splitting turns out bad? Maybe should use consistent hashing

  • @ask5583
    @ask5583 3 года назад

    awesome explanation!! thanks!!

  • @souravmukherjee1484
    @souravmukherjee1484 5 лет назад

    What would be the differences in the Metadatamanager and control for designing a database which has availability over consistency. Maybe some practical examples would help of different kind of databases, say DynamoDB, BigTable, MongoDB?

  • @Venkat2811
    @Venkat2811 6 лет назад

    Glad you're back !!

  • @indarjit007
    @indarjit007 5 лет назад

    Nice video! I have a question here. Suppose In case of network partitioning , the leader node which went disconnected comes back in the network . Now this node has old data for lot of records. Will it sync its data with rest of the nodes in the group or it will ignore the missed updates and simply take the new writes. As old data in one node will not impact the overall read accuracy of the group because majority will have updated data.

  • @256cool
    @256cool 3 года назад

    Nice video. Have a doubt (@32:29), when there are more than one leader in replication group and a 'get' request goes to the old leader, why would old leader talk to other nodes in the group instead of just returning the data back to request manager??

  • @waqarahmad5939
    @waqarahmad5939 3 года назад

    Nice video. I see a problem in your leader election. Though you say that Metadata manager will elect leader, but when you explain it seems leader is putting itself for election and the one who has the majority is confirmed as a leader. The confusion is also confirmed when an old leader becomes alive again. Instead of having a confirmation from the metadata manager, you demonstrated a contention between two leaders. It seems you have mixed Paxos (where cluster chooses its leader and there is no metadata (config) manager) and metadata manager based election. I may be wrong.

  • @WellyBoyJacko
    @WellyBoyJacko 4 года назад

    Great video Tushar! Although you could have done the entrance where you appear from one side like in the KMP algorithm video I believe it to e a lot more effective. Thanks!

  • @SonuSonu-tk5pk
    @SonuSonu-tk5pk 6 лет назад +2

    Legend returns ~~~

  • @junjiang3486
    @junjiang3486 6 лет назад

    Hey man please come back with more awesome videos!

  • @manoharc07
    @manoharc07 4 года назад

    Okay here's the thing, this video is irrelevant to me,also this is the latest video so I commented here. your algo videos are super useful,
    Please consider making videos again

  • @deathbombs
    @deathbombs 2 года назад

    What if we design for a eventual consistency data storage? One that supports transactions such as rollingback?

  • @anshulabhinav13
    @anshulabhinav13 5 лет назад +6

    Why is "Chicken recipe" a part of this playlist ? :))

    • @blasttrash
      @blasttrash 5 лет назад

      lol yeah. thats so funny

  • @yazicib1
    @yazicib1 5 лет назад

    The way you defined sequence numbers, you cannot compare them between machines and decide which one came later unless you have a centralized time server. You cannot rely on local timestamp to make such decisions....

    • @guangyang5116
      @guangyang5116 5 лет назад

      Agree. I'm also thinking about how to generate sequence number and where to store them.

  • @adnanniazi9954
    @adnanniazi9954 2 года назад

    To remove an articulation point from a graph we need to ?
    A: remove an edge
    B: add an edge
    C: both a and b
    D: none of the above

  • @dipukrishnan6383
    @dipukrishnan6383 3 года назад

    Can you suggest one good book on data structures and algorithms? I have searched on internet and find like so many books and it's really confusing. If you can suggest one book which is going to cover everything and can be used as a good reference for preparing for coding interviews then that will be great. And if that book is written for c++, then that's even better.

    • @thepriestofvaranasi
      @thepriestofvaranasi 3 года назад

      Refer to the 6th edition of the book Cormen. It is really good for Data structures and algorithms.

  • @Jaypatel512
    @Jaypatel512 5 лет назад

    Why use the higher sequencer number, shouldnt the lower sequencer number win considering he was first ?

  • @Jason-be2cy
    @Jason-be2cy 6 лет назад

    Welcome back!

  • @kingofwebguru
    @kingofwebguru 2 года назад

    It would be great to have the references.

  • @d.barisacar2635
    @d.barisacar2635 3 года назад

    No ACID is a bit misleading, since the database designed should be "durable" at least, so the "D" in ACID.

  • @PrashantNigam
    @PrashantNigam 5 лет назад

    Hi Tushar! Can you please make a video on Single Pair All shortest paths problem? Given a source and a destination find all the shortest paths between them (there can be more than 1 path with the same length, hence all the shortest paths).

  • @Logan_deadpool
    @Logan_deadpool 6 лет назад

    Nice video, thanks! Could you tell me what's the difference between unique per node and unique node id?

    • @tusharroy2525
      @tusharroy2525  6 лет назад +2

      each node will be assigned a unique number thats unique node id. Then each node will generate a unique number upto 4 bytes which will roll back to 0.

  • @charanteja6275
    @charanteja6275 6 лет назад

    Thanks tushar. can u make a video on choosing the right metadata manager like zookeeper, redis for distributed systems.

    • @nikhil199029
      @nikhil199029 6 лет назад +1

      look a the first video in this series.

  • @antonkot6250
    @antonkot6250 6 лет назад +1

    It's interesting video about how Hadoop/Mongo-like distributed systems operate. Imo, it's not very common interview question and System Design task.

  • @riturajchauhan8642
    @riturajchauhan8642 5 лет назад

    QQ, how is the Sequence number continuously increasing?. I am trying to understand the unique and increasing nature of sequence number. and how would we compare a sequence number against another?

  • @shubhamgoel9512
    @shubhamgoel9512 6 лет назад

    Hi Tushar,
    I have one doubt.. You said each put will go through the majority of nodes. Suppose there are 3 nodes in 1 RG, node1 (leader), node2, node3.. Now there are 2 put requests. put(a, 30) put(b, 30)... put(a, 30) succeded in node 1 and 2 (mojority is 2)... put(b,30) succeeded in node1 and node3. Now node1 dies, which node will be new leader now? Node2 or Node3? (Both nodes have partial data now).

  • @_thehunter_
    @_thehunter_ 6 лет назад

    good video, you should do a video about postmortem or internals of any one KVpair/NoSQL/NewSQL DB.. peace

  • @chunminghe281
    @chunminghe281 6 лет назад

    Long time no see!

  • @nicolasgoosen5142
    @nicolasgoosen5142 3 года назад

    You've got nearly 200K subscribers - why'd you stop making videos???

  • @Jsr10008
    @Jsr10008 6 лет назад

    Hello Tushar Sir, Please share some knowledge on Multithreading and Concurrency. Thanks in Advance 👍🏻

    • @AmanGarg95
      @AmanGarg95 5 лет назад

      You can follow the channel @DefogTech for multithreading videos.

  • @rubencashie9441
    @rubencashie9441 6 лет назад

    Yay new video!

  • @chionn8764
    @chionn8764 3 года назад

    Is there a reference paper for the ideas presented in this video?

  • @Flower_withanshi
    @Flower_withanshi 5 лет назад

    can you please explain this..
    I am running java application in a distributed environment with multiple nodes of MySQL instances, can you please help me understand how all the nodes sync . In this kind of scenarios what are things, we need to take care in java application at the code level. (suppose banking application with money transfer, one node used by ATM withdrawal and checking balance after few milliseconds which is connected to a different node )
    Please explain with an example.

  • @turinreza
    @turinreza 6 лет назад

    Tushar, can you share a paxos c++ implementation for keeping a few nodes in sync?

  • @true_human_007
    @true_human_007 3 года назад

    WHAT IS THE SIGNIFICANCE OF "UNIQUE PER NODE" IN SEQUENCER WHEN WE HAVE NODE ID IN SEQUENCER

  • @shreyageek
    @shreyageek 2 года назад

    Google Interview Prep has his videos link too...✌🏻✌🏻

  • @deathbombs
    @deathbombs 2 года назад

    I like how you go into great depth on leader electing for groups, and the many edge cases of the system. But I feel maybe it's inappropriate? No interviewer would be asking you to design KV storage, and spend so much time on the leader management edge cases

  • @abhashjain4588
    @abhashjain4588 6 лет назад

    Hi Tushar,
    Thanks for nice video. I was wondering about one question.
    How will you implement unix like find?
    My approach is as file s/m have dirent structure and file are within dirent(directory structure). So we will go each and every dirent and find the files which match the criteria. if it is a match then we can return the list of such files.
    If there are one or more criteria then we can have criteria class and for each file we can match all the criteria class.
    can you help me with some better solution?
    Thanks in advance. I have tried to google this a lot and doesn't find any solution for this question.