is having a controller for handling splits automatically a good idea? what if the algorithm for splitting turns out bad? Maybe should use consistent hashing
Nice video Tushar! It seems to me that when we choose consistency over availability, we horizontally partition the data with replication groups for fail overs. When going for high availability, replication across nodes with a global consensus helps. Btw 38:50, I think it's 1000 replication groups for 5 PetaByte, instead of the 100 mentioned :)
Please make more videos. Your explanation is really good. I want to land myself into product based company (FAANG). PLEASE SHARE YOUR KNOWLEDGE AND EXPERIENCES.
Nice video Tushar with lot of info! I have 2 questions on sequencer generation. I understand that the node which you referred here is the Request Manager(RM) node. The sequence generation will be done as a first step, soon after receiving the request in RM node. Question 1 - How the sequence will be consistent, when it gets generated from different nodes? Because if we use data like nodeId, node's sequence then it wont be consistent. Consistent meaning the same sequence should be generated always for a request from any RM node. May be I am missing some thing, please correct the understanding. Question 2 - If we use timestamp then it is expected that all the nodes in RG should be with exact same timestamp. This is again a challenge. So, we need to go with a distributed sequence generating application which can be used by all the RG system. This application can generate numbers in sequential order without using nodeId or timestamp. Your thoughts?
You sort if alluded to using zookeeper as an underlying mechanism for implementing the metadata manager. But zookeeper itself is a KV store, of the sort this design is trying to implement - Why doesn't zookeeper satisfy the need of this design? What additional capabilities does this design provide over zookeeper? How would you implement metadata manager without using an existing KV store like zookeeper?
Kudos for the explanation. I recently got a call from google and they sent me the interview preparation material of which one points to your channel :)
How is sequencer resolving conflicts if two puts come at the same time? I think it would still create conflict as timestamp+other sequencer params would be same.
Hi Tustard, nice video! I have a doubt, to create the sequence number, you are using a timestamp to make sure that a later stored value has a bigger sequence value. Where do you generate this timestamp? What if clocks from different machines are not synchronized? Thank you very much for your videos.
kaha gaya usse dhundho... u should come back... views would sky rocket and it'll really help a lof of people .... your vides are clear precise and exactly too the point...
Hi Tushar thanks a ton for videos. I have a question w.r.t to Sequencer. It is not clear to me who is Sequencer generator (Request Manager or RG Leader ?). If Request Manager, then two PUT request for same record/key land-up on two different Request Manager would get different Sequencer and using Sequencer from two different Request Manager can not be used for logical ordering. In such cases which PUT should win ?? Please correct me if I misinterpret it.
Why tf do you care about a few internet points/likes that is absolutely useless and utter bs which might give you a moment of satisfaction and then you'll be back to reality where everything is falling apart and you're trying your best to keep it all together?
Hi Tushar, I am able to understand and apply the algorithms from your DP videos by my own and works very well.. I need tips on how to improve it more.. Thank you :-)
very detailed and focus on key concepts. Well done Tushar I can clearly see the amount of efforts you are putting it so people can understands such concepts easily
Hi Tushor Roy This is Raghavendra. Let me tell you first thing is i am follower of your videos. Your videos are giving me lot of help in understanding the Data Structures and which is helping me in solving the Problem solving questions and also improve my data structures knowledge. I am writing this comment to know how you prepared for all the concepts you have done the videos. I mean to say how you learn what books you read or how you are able to learn so nicely. Reason for asking this question is how can i improve on self reading and updating my self like you do in order to master the concepts like you. Please make a video on that about your practices on improving the skills and also mastering them. Sincerely waiting for you reply. Regards, Raghavendra
These are good questions. Lot I learn from my work experience and by reading stuff online and then connecting dots. I wish there was one book to rule them all but its not there. You will have to watch videos from various people and read stuff to improve your knowledge.
Thanks for the reply Tushor Why i asked this question to you because only you explained the Red Black Tree so nicely. I read many blogs in internet before watching your video. Because of your video i have teached many about Red Black Tree. You are the reason i started learning from youtube. Now i am doing soving probelms on Data Structures in Hacker Rank from three months. Then i was thinking how you learned about Red Black Tree since lot of stuff available in internet was not complete or i was finding difficult to understand it completely. Then i was thinking you must be having really good plan to learn new things which you don't know. That is the reason i asked you this question. No need reply if you wanna say the same thing the replied comment. Regards, Raghavendra
Mazor architecture flaw is not using consistent hashing which every interviewers favourite, hey wouldnt the data partition will be imbalance blah blah blah
Miles to go before you sleep. Could you please prepare system design and LLD for the following: 1. Simulation of a cricket match, football match etc. 2. Implementation of Queue like Kafka 3. Ecommerce price drop notification system for 50M products 4. Amazon like website and order management system i.e. everything that happens after clicking checkout 5. Elevator system 6. Scrabble 7. Chess game 8. A library for evaluation of expression
What if a controller fails? There's no reservation and it ensures reliability of other nodes, which makes it a single point of failure. Something like an active pair would be a solution here, I guess?
Summary of leader management: Followers will reject messages from non leaders Leaders are ones confirmed by the majority, to avoid requests going to the wrong place
Hi Tushar, I have one doubt.. You said each put will go through the majority of nodes. Suppose there are 3 nodes in 1 RG, node1 (leader), node2, node3.. Now there are 2 put requests. put(a, 30) put(b, 30)... put(a, 30) succeded in node 1 and 2 (mojority is 2)... put(b,30) succeeded in node1 and node3. Now node1 dies, which node will be new leader now? Node2 or Node3? (Both nodes have partial data now).
QQ, how is the Sequence number continuously increasing?. I am trying to understand the unique and increasing nature of sequence number. and how would we compare a sequence number against another?
Okay here's the thing, this video is irrelevant to me,also this is the latest video so I commented here. your algo videos are super useful, Please consider making videos again
Nice video. Have a doubt (@32:29), when there are more than one leader in replication group and a 'get' request goes to the old leader, why would old leader talk to other nodes in the group instead of just returning the data back to request manager??
Hi Tushar! Can you please make a video on Single Pair All shortest paths problem? Given a source and a destination find all the shortest paths between them (there can be more than 1 path with the same length, hence all the shortest paths).
can you please explain this.. I am running java application in a distributed environment with multiple nodes of MySQL instances, can you please help me understand how all the nodes sync . In this kind of scenarios what are things, we need to take care in java application at the code level. (suppose banking application with money transfer, one node used by ATM withdrawal and checking balance after few milliseconds which is connected to a different node ) Please explain with an example.
I think when a new leader is elected, the other nodes can get from the metadata service that there is a new leader and it's not necessary for the new leader to talk to others?
Nice video. I see a problem in your leader election. Though you say that Metadata manager will elect leader, but when you explain it seems leader is putting itself for election and the one who has the majority is confirmed as a leader. The confusion is also confirmed when an old leader becomes alive again. Instead of having a confirmation from the metadata manager, you demonstrated a contention between two leaders. It seems you have mixed Paxos (where cluster chooses its leader and there is no metadata (config) manager) and metadata manager based election. I may be wrong.
What would be the differences in the Metadatamanager and control for designing a database which has availability over consistency. Maybe some practical examples would help of different kind of databases, say DynamoDB, BigTable, MongoDB?
Hi Tushar, thanks for the great content. One Question, How it is going to store records with multiple columns, in Stocks table we just have Name(key) and Price(value) but if we have a user table having user id (which can be considered as key), First Name, Last Name, Email etc. then how the values will be stored.? In wide-column and document based Databases we can store them as JSON but in key-value database how its gonna be.?
I recommend emphasizing that each replication can handle "f" number of "crash failures" instead of simply "bad nodes". The reason is that "bad nodes" includes byzantine nodes, which cannot be handled in this setup.
Nice video! I have a question here. Suppose In case of network partitioning , the leader node which went disconnected comes back in the network . Now this node has old data for lot of records. Will it sync its data with rest of the nodes in the group or it will ignore the missed updates and simply take the new writes. As old data in one node will not impact the overall read accuracy of the group because majority will have updated data.
Can you suggest one good book on data structures and algorithms? I have searched on internet and find like so many books and it's really confusing. If you can suggest one book which is going to cover everything and can be used as a good reference for preparing for coding interviews then that will be great. And if that book is written for c++, then that's even better.
The way you defined sequence numbers, you cannot compare them between machines and decide which one came later unless you have a centralized time server. You cannot rely on local timestamp to make such decisions....
Great video Tushar! Although you could have done the entrance where you appear from one side like in the KMP algorithm video I believe it to e a lot more effective. Thanks!
Where you disappeared man? Your content is gold mine!
Bro, he worked in temporal technology from 2022 before this he worked in apple for 7 years... ❤
awesome video! i'm glad you're making more system design videos
love you Tushar
is having a controller for handling splits automatically a good idea? what if the algorithm for splitting turns out bad? Maybe should use consistent hashing
Nice video Tushar!
It seems to me that when we choose consistency over availability, we horizontally partition the data with replication groups for fail overs. When going for high availability, replication across nodes with a global consensus helps.
Btw 38:50, I think it's 1000 replication groups for 5 PetaByte, instead of the 100 mentioned :)
Yes I misspoke 100 instead of 1000. Did write 1000 on board though. Thanks for pointing out.
Not sure what you mean by global consensus. That would be fairly expensive operation. Are you referring to Cassandra model?
Yes, I meant the quorum check in Cassandra.
Two awesome youtubers together in a comment. It made my day :)
Hbase vs Cassandra
Glad to see you back. Whenever I remember DP your name automatically comes to my mind :)
Lol
Ya same me too 😀😀
same to me !
same here
:D
+1
Please make more videos. Your explanation is really good. I want to land myself into product based company (FAANG). PLEASE SHARE YOUR KNOWLEDGE AND EXPERIENCES.
Nice video Tushar with lot of info!
I have 2 questions on sequencer generation. I understand that the node which you referred here is the Request Manager(RM) node. The sequence generation will be done as a first step, soon after receiving the request in RM node.
Question 1 - How the sequence will be consistent, when it gets generated from different nodes? Because if we use data like nodeId, node's sequence then it wont be consistent. Consistent meaning the same sequence should be generated always for a request from any RM node. May be I am missing some thing, please correct the understanding.
Question 2 - If we use timestamp then it is expected that all the nodes in RG should be with exact same timestamp. This is again a challenge.
So, we need to go with a distributed sequence generating application which can be used by all the RG system. This application can generate numbers in sequential order without using nodeId or timestamp. Your thoughts?
You sort if alluded to using zookeeper as an underlying mechanism for implementing the metadata manager. But zookeeper itself is a KV store, of the sort this design is trying to implement - Why doesn't zookeeper satisfy the need of this design? What additional capabilities does this design provide over zookeeper? How would you implement metadata manager without using an existing KV store like zookeeper?
Nice video Gautam Gambhir 👍
Kudos for the explanation. I recently got a call from google and they sent me the interview preparation material of which one points to your channel :)
Can you please share the material to me? sese.dev.k@gmail.com
Should it be better to name it partition group instead of replication group?
How is sequencer resolving conflicts if two puts come at the same time? I think it would still create conflict as timestamp+other sequencer params would be same.
Hi Tustard, nice video! I have a doubt, to create the sequence number, you are using a timestamp to make sure that a later stored value has a bigger sequence value. Where do you generate this timestamp? What if clocks from different machines are not synchronized? Thank you very much for your videos.
Tustard? Really?
You are the best DS/ALGO teacher, you helped me a lot through my programming journey.
Why is "Chicken recipe" a part of this playlist ? :))
lol yeah. thats so funny
kaha gaya usse dhundho...
u should come back...
views would sky rocket and it'll really help a lof of people ....
your vides are clear precise and exactly too the point...
Nice to see you back sir after a long time please do make some more videos on DYNAMIC PROGRAMMING AND GRAPHS problems sir
Never got enough of your videos! Thanks for the high-quality giving backs!
Thx
Tushar bhai please apna channel mujhe dedo😭 mera channel suspend ho gya hai
the man
the myth
the legend
great content
Hi Tushar thanks a ton for videos. I have a question w.r.t to Sequencer. It is not clear to me who is Sequencer generator (Request Manager or RG Leader ?). If Request Manager, then two PUT request for same record/key land-up on two different Request Manager would get different Sequencer and using Sequencer from two different Request Manager can not be used for logical ordering. In such cases which PUT should win ?? Please correct me if I misinterpret it.
Yes, I agree. I think storing them somewhere in RG leader makes more sense.
Thanks for all the insights. I have a question: How do you plan to handle the situation when the controller node goes down? Is that replicated too?
Hello,you are the clearest and best person I have ever seen to explain the algorithm. Thank you very much!
Where are u sir?
I read a System design book, and got all confused. After watching this video most of my confusion was cleared. Thank you Tushar :)
Please keep making videos .. they are very useful .. thanks
Nicely explained Tushar! Please share some resources from where you gathered/studied this information.
Google is coming with website for coronavirus crisis! Please please do system design of that! It will be super hot topic I am predicting!
Who is here at the end of 2021? Tushar still rocks!
Why tf do you care about a few internet points/likes that is absolutely useless and utter bs which might give you a moment of satisfaction and then you'll be back to reality where everything is falling apart and you're trying your best to keep it all together?
Awesome video. Please make a system design video on a voice-activated system like Alexa or Siri and Airbnb.
you should have more subs. You have the best explanation for every topic.
Hi Tushar, I am able to understand and apply the algorithms from your DP videos by my own and works very well.. I need tips on how to improve it more.. Thank you :-)
so big noise plz fix it next time
Excellent Video ! Considering your experience in S3 team, did you get any chance to work on any of these components ?
They had different kind of system.
Looking forward to your new videos!! You've been gone so long~
very detailed and focus on key concepts. Well done Tushar I can clearly see the amount of efforts you are putting it so people can understands such concepts easily
Hi Tushor Roy
This is Raghavendra.
Let me tell you first thing is i am follower of your videos.
Your videos are giving me lot of help in understanding the
Data Structures and which is helping me in solving the Problem solving questions
and also improve my data structures knowledge.
I am writing this comment to know how you prepared for all the concepts you
have done the videos. I mean to say how you learn what books you read or how
you are able to learn so nicely.
Reason for asking this question is how can i improve on self reading and updating
my self like you do in order to master the concepts like you.
Please make a video on that about your practices on improving the skills and also
mastering them.
Sincerely waiting for you reply.
Regards,
Raghavendra
These are good questions. Lot I learn from my work experience and by reading stuff online and then connecting dots. I wish there was one book to rule them all but its not there. You will have to watch videos from various people and read stuff to improve your knowledge.
Thanks for the reply Tushor
Why i asked this question to you because only you explained the Red Black Tree so nicely.
I read many blogs in internet before watching your video. Because of your video i have teached many about Red Black Tree.
You are the reason i started learning from youtube.
Now i am doing soving probelms on Data Structures in Hacker Rank from three months.
Then i was thinking how you learned about Red Black Tree since lot of stuff available in internet was not complete
or i was finding difficult to understand it completely.
Then i was thinking you must be having really good plan to learn new things which you don't know.
That is the reason i asked you this question.
No need reply if you wanna say the same thing the replied comment.
Regards,
Raghavendra
Red black tree I learned from Wikipedia.
Mazor architecture flaw is not using consistent hashing which every interviewers favourite, hey wouldnt the data partition will be imbalance blah blah blah
Miles to go before you sleep.
Could you please prepare system design and LLD for the following:
1. Simulation of a cricket match, football match etc.
2. Implementation of Queue like Kafka
3. Ecommerce price drop notification system for 50M products
4. Amazon like website and order management system i.e. everything that happens after clicking checkout
5. Elevator system
6. Scrabble
7. Chess game
8. A library for evaluation of expression
Tushar is back!!!
You've got nearly 200K subscribers - why'd you stop making videos???
What if a controller fails? There's no reservation and it ensures reliability of other nodes, which makes it a single point of failure. Something like an active pair would be a solution here, I guess?
Summary of leader management:
Followers will reject messages from non leaders
Leaders are ones confirmed by the majority, to avoid requests going to the wrong place
Hi Tushar,
I have one doubt.. You said each put will go through the majority of nodes. Suppose there are 3 nodes in 1 RG, node1 (leader), node2, node3.. Now there are 2 put requests. put(a, 30) put(b, 30)... put(a, 30) succeded in node 1 and 2 (mojority is 2)... put(b,30) succeeded in node1 and node3. Now node1 dies, which node will be new leader now? Node2 or Node3? (Both nodes have partial data now).
QQ, how is the Sequence number continuously increasing?. I am trying to understand the unique and increasing nature of sequence number. and how would we compare a sequence number against another?
great video Tushar, keep up the good work!
Thank you so much Tushar :-) Was just looking for some content on this everywhere and got the most informative and helpful video here .
Hello Fraaaaands, Chai Pee Lo ☕
😂😂😂😂😂😂😂😂😂😂
Okay here's the thing, this video is irrelevant to me,also this is the latest video so I commented here. your algo videos are super useful,
Please consider making videos again
What if we design for a eventual consistency data storage? One that supports transactions such as rollingback?
No ACID is a bit misleading, since the database designed should be "durable" at least, so the "D" in ACID.
Google Interview Prep has his videos link too...✌🏻✌🏻
Thanks a ton bro. Your teaching style is just awesome. Learned a lot. Pls made video on DS too.
He has done many videos on data structures on trees and graphs very nice very nicely explained. Check in his channel there are many videos.
Raghu Gr ok sure thanks
Why use the higher sequencer number, shouldnt the lower sequencer number win considering he was first ?
Nice video. Have a doubt (@32:29), when there are more than one leader in replication group and a 'get' request goes to the old leader, why would old leader talk to other nodes in the group instead of just returning the data back to request manager??
To remove an articulation point from a graph we need to ?
A: remove an edge
B: add an edge
C: both a and b
D: none of the above
Random Indian guy?
Hi Tushar! Can you please make a video on Single Pair All shortest paths problem? Given a source and a destination find all the shortest paths between them (there can be more than 1 path with the same length, hence all the shortest paths).
can you please explain this..
I am running java application in a distributed environment with multiple nodes of MySQL instances, can you please help me understand how all the nodes sync . In this kind of scenarios what are things, we need to take care in java application at the code level. (suppose banking application with money transfer, one node used by ATM withdrawal and checking balance after few milliseconds which is connected to a different node )
Please explain with an example.
I think when a new leader is elected, the other nodes can get from the metadata service that there is a new leader and it's not necessary for the new leader to talk to others?
Nice video. I see a problem in your leader election. Though you say that Metadata manager will elect leader, but when you explain it seems leader is putting itself for election and the one who has the majority is confirmed as a leader. The confusion is also confirmed when an old leader becomes alive again. Instead of having a confirmation from the metadata manager, you demonstrated a contention between two leaders. It seems you have mixed Paxos (where cluster chooses its leader and there is no metadata (config) manager) and metadata manager based election. I may be wrong.
great content, but your presentation style kind of makes me sleepy, probably my fault.
WHAT IS THE SIGNIFICANCE OF "UNIQUE PER NODE" IN SEQUENCER WHEN WE HAVE NODE ID IN SEQUENCER
Why is distributed database using a table? Arent databases use SQL?
why youtube is recommeding me this video in 2021
What would be the differences in the Metadatamanager and control for designing a database which has availability over consistency. Maybe some practical examples would help of different kind of databases, say DynamoDB, BigTable, MongoDB?
Tushar is back! Looks like you've gained some weights.
Hi Tushar, thanks for the great content.
One Question,
How it is going to store records with multiple columns, in Stocks table we just have Name(key) and Price(value) but if we have a user table having user id (which can be considered as key), First Name, Last Name, Email etc. then how the values will be stored.?
In wide-column and document based Databases we can store them as JSON but in key-value database how its gonna be.?
loved this video. thanks for uploading.
Welcome
hey there, I also like his videos a lot :)
Thank you Master, excellent video. I was actually asked to design a Distributed DB recently in an interview.
Is there a reference paper for the ideas presented in this video?
Hi, Tushar .In 4:31,The timestamp of ns 8 bytes, how this is calculated ? From my calculation, we need around 16 bytes.
Bro please greedy algorithm ki series bana do
Tushar, can you share a paxos c++ implementation for keeping a few nodes in sync?
It would be great to have the references.
I recommend emphasizing that each replication can handle "f" number of "crash failures" instead of simply "bad nodes". The reason is that "bad nodes" includes byzantine nodes, which cannot be handled in this setup.
Very good video overall
Nice video! I have a question here. Suppose In case of network partitioning , the leader node which went disconnected comes back in the network . Now this node has old data for lot of records. Will it sync its data with rest of the nodes in the group or it will ignore the missed updates and simply take the new writes. As old data in one node will not impact the overall read accuracy of the group because majority will have updated data.
good video, you should do a video about postmortem or internals of any one KVpair/NoSQL/NewSQL DB.. peace
is there a book that i can consider for system design, please suggest
Can you suggest one good book on data structures and algorithms? I have searched on internet and find like so many books and it's really confusing. If you can suggest one book which is going to cover everything and can be used as a good reference for preparing for coding interviews then that will be great. And if that book is written for c++, then that's even better.
Refer to the 6th edition of the book Cormen. It is really good for Data structures and algorithms.
The way you defined sequence numbers, you cannot compare them between machines and decide which one came later unless you have a centralized time server. You cannot rely on local timestamp to make such decisions....
Agree. I'm also thinking about how to generate sequence number and where to store them.
How is the trie actually stored in persistent storage?
how to handle hot keys ? heavy load request goes to master , how to handle?
What is the system design of tiktok app
sir plzz make a video on SHORTEST UNCOMMON SUBSEQUENCE using dynamic programming
Great video Tushar! Although you could have done the entrance where you appear from one side like in the KMP algorithm video I believe it to e a lot more effective. Thanks!
Legend returns ~~~
Sir do you Bangladeshi
awesome explanation!! thanks!!
Hey man please come back with more awesome videos!
Hello Tushar Sir, Please share some knowledge on Multithreading and Concurrency. Thanks in Advance 👍🏻
You can follow the channel @DefogTech for multithreading videos.
Glad to see your video dude. Please continue making more
Well done Tushar. Well explained.
Hey Tushar. Your smile and energy is missing.
i'll just use elasticsearch
Thanks tushar. can u make a video on choosing the right metadata manager like zookeeper, redis for distributed systems.
look a the first video in this series.
Tushar is great teacher.
hi! why don't you post videos anymore?