I am a software engineer at Amazon, and your videos have helped me a lot. I was recently able to get senior and principal level offers from Oracle, Microsoft, and a few others. Thanks again. Appreciate it.
Once you read Designing Data-Intensive Applications all the videos becomes a great source to revisions. Great quality videos, keep up the good work!!!!
Thanks for great video, Jordan. Follow up questions: In the partial failures section around 12:00, I didn't quite understand how did A grab the lock because the write did not succeed in atleast 2 nodes which is needed for quorum right? I understood it as write was successful only in one node and not in other. So unless 2 nodes ack it, write is not successful isnt it, which means A can't claim it has a lock yet?
like your other videos, really top quality stuff man. question about partial failures around 13 minutes. why can't the node at the top accept requests and coordinate writes amongst all other other nodes in the cluster? doesn't Cassandra do it like this? so the coordinator would know if the write request to all nodes was successful or not,and if not would revert it's state.
Hi Jordan - Another great video - For Single Leader Synchronous Repliction - If the Replica dies, wouldn;t there be some zk to tell leader how many acks. to wait instead of leader waiting for the response from dead replica ? if this make sense..is there any other reason why sync replication in Single Leader is not good besides latency ? it seems to me it is fault tolerant like distributed consensus
Well then you need zookeeper, which isn't even single leader lol. But if the one replica that we're synchronously replicating to dies, presumably we'd stop writing, because the whole point of synchronous replication is that we're 100% sure that all committed data is replicated.
Hey , I love the way you teach . Can you also do this concepts in hands-on project in spring boot or something , so that we can improve our coding and also learn how to test such scenario in real Life
This is something I've thought about, but realistically would take me a very long time to do haha. In my current state, it's unlikely I can, but maybe if I have some significant life changes
hi jordan thanks for those vidoes very helpful, but something it's clear for me how does s3 handles the fencing tokens? doesnt the appliation requires another layer for solving this out?
S3 is a bit of a bad example because we don't own s3. But imagine you own whatever data source you're sinking to, you can just build this into the logic there.
@@Keira77L-t3b In this particular case yes, however there are unfortunately cases in leaderless replication where for example in N:3, R:2, W:2 with 3 clients writing that they can all achieve quorum and all 3 nodes achieve different values. See my most recent dynamo video.
In the Distributed Conses, we had 5 nodes(1 leader, 4 followers). When leader went down, we chose the follower that was up to date. What if as soon the follower was chosen as a leader, it went down. Now we have 2 nodes with old data and 1 with new. What happens in this case?
Hey Jordan , Could you please create a video on stock exchange system design, that would majorly focus on the users getting a notification on the stocks they have subscribed if the stock values go up or down based on some parameters in real time. Thanks
I'm confused about something. You mentioned around 4:20 that client 1's lock will expire due to the garbage collection, but then around 5:00 you say that when client 1 comes back they "still have the lock." I thought it expired for them?
I'm kinda curious about the fencing token, since the destination write (in your video it's S3) it has to know/store the value of used fencing token so far, is that possible? Since I think that we might have to communicate with many 3rd parties which we do not have the "right" to do that check. How can we resolve it?
I'm not sure what you mean here. A fencing token is just a number that you can pass to an external service, as long as the external service allows it in their API. Then the external service will only accept writes in an increasing order of that number.
@@jordanhasnolife5163 " as long as the external service allows it in their API" - that's my point. I meant, we can't be sure that the external service we want to use will always support fencing token (or have similar thing), what should we do in that case?
Your intros are always weirdly funny :) Question: in your final design, is the queue also written to followers? i.e. If Leader were to go down, would followers know that B is waiting for A? How would the websocket be restored by A?
Yep, the queue is written to followers. If the leader goes down, the clients will notice it, and reach out to the other links of nodes in the zookeeper cluster to get the address of the new leader and connect there.
in minute 2 of the problem, so dunno what to expect next, but isn’t S3 object immutable? why wouldn’t get corrupted? i get what you are trying to convey though
Well when you feel you are better you don’t have to prove it to anyone, just relax and see ahead and I am sure you will find a much better girl than her !!
Nice Video but i have one query. In case of distributed consensus, how reads are done for lock ?. If it is read from replica which is not upto date, it can lead to a problem. I have also watched your raft videos, which gives me impression that distributed consensus provides lineraliziability but not strong consistency in reading.
Can you elaborate? I don't know what you mean here. Garbage collection just happens when we write stuff to S3 and then it turns out we don't need to use it
@@jordanhasnolife5163 does that mean this leader + several followers just act like one zookeeper node. And for horizontal scale up we need more zookeeper nodes with sharding(and every node have their own leader and followers)?
@@jordanhasnolife5163 I think that's true, but I don't necessarily think it's bad. Your videos aren't really structured in an interview style. You go into a lot of depth in your designs. It would be unrealistic to draw the sometimes huge designs your show at the end of your videos in the space of an interview - especially somewhere like Meta where you realistically only have 35 minutes of design time, but it's beneficial to see everything as inspiration for how you might deep dive in different areas. In a real interview you may only deep dive into a couple of the areas you show.
Causal consistency just implies that if a write B happened because a user first saw write A, we should never be able to read B without also having access to the A write Linearizable databases are causally consistent, but not all causally consistent databases are linearizable.
Hi Jordan, really appreciate the content, is it possible for you to share your ipad notes. It is difficult to follow and revise your content without the notes and making the entire notes while following the video is time consuming. It would be really helpful if you could share your hand written notes from ipad (maybe it is not perfect but still a better reference than nothing) which we could keep as reference to follow your content. As we go through the video, we could add our own comments or notes on it to make it more clear. Please consider.
Make irrelevant question but how will things work when there are multiple resources to contend for. Eg one s3 file second s3 file maybe a customers sms channel. ? How will distributed consesus work then
@@jordanhasnolife5163 when the master node fails then backup node1 have latest version of lock 1 but node2 has latest version of lock2. Then who will be the leader?
Thanks for the great video. I need your help with one of my task. I will become your patreon if you help. In my current company I have received one task in which I have to execute queries in the order they were originally executed. I have a list of queries and their original start and end times. So to execute them again in the same order we need to build dependency graph. How we can build this dependency graph. Query 1: start time 1 end time 3 Query 2 start time 2 and end time 5 Query 3 start time 4 end time. Qry 2 can start after qry 1 has started. Query 3 can be started after 1 finished and 2 started
Doesn't really make much sense to me considering the start times and end times. But look up topological sorting. Make a graph of the dependency relationships, and run a topological sort. This will tell you when you can schedule a given task, at which point you can run a second job that looks at currently scheduled tasks and whether their start time has passed. I don't have a patreon, send it to charity.
@@jordanhasnolife5163 Thanks for replying! I meant where did you learn about all this? Is there a comprehensive resource or is this just result of your years of experience in tech?
@@sid4579 Well considering that I don't have many years of experience in tech, I'm going to say that I did not learn anything that way. I'm simply just aggregating any information that I can find across anywhere on the internet. If there was a comprehensive resource for it, I don't think I'd be making these videos in the first place, as I myself am attempting to be a comprehensive resource for it.
I am a software engineer at Amazon, and your videos have helped me a lot. I was recently able to get senior and principal level offers from Oracle, Microsoft, and a few others. Thanks again. Appreciate it.
Holy moly that's awesome!! Congrats on your offers!!
Hey, can we connect on LinkedIn?
Once you read Designing Data-Intensive Applications all the videos becomes a great source to revisions. Great quality videos, keep up the good work!!!!
exactly
The intro is hilarious😂p.s. congrats on the 30k milestone!
Thanks for great video, Jordan. Follow up questions:
In the partial failures section around 12:00, I didn't quite understand how did A grab the lock because the write did not succeed in atleast 2 nodes which is needed for quorum right? I understood it as write was successful only in one node and not in other. So unless 2 nodes ack it, write is not successful isnt it, which means A can't claim it has a lock yet?
Thank you for the wonderful video!
like your other videos, really top quality stuff man. question about partial failures around 13 minutes. why can't the node at the top accept requests and coordinate writes amongst all other other nodes in the cluster? doesn't Cassandra do it like this? so the coordinator would know if the write request to all nodes was successful or not,and if not would revert it's state.
Yeah, the main thing is you just can't rely on reverts, because what if your coordinator goes down before it's able to revert the data?
Thanks for the content! Preparing for an upcoming Mid level role, hopefully your videos will help me
Good luck!
Hi Jordan - Another great video - For Single Leader Synchronous Repliction - If the Replica dies, wouldn;t there be some zk to tell leader how many acks. to wait instead of leader waiting for the response from dead replica ? if this make sense..is there any other reason why sync replication in Single Leader is not good besides latency ? it seems to me it is fault tolerant like distributed consensus
Well then you need zookeeper, which isn't even single leader lol. But if the one replica that we're synchronously replicating to dies, presumably we'd stop writing, because the whole point of synchronous replication is that we're 100% sure that all committed data is replicated.
Hey , I love the way you teach . Can you also do this concepts in hands-on project in spring boot or something , so that we can improve our coding and also learn how to test such scenario in real Life
This is something I've thought about, but realistically would take me a very long time to do haha. In my current state, it's unlikely I can, but maybe if I have some significant life changes
hi jordan thanks for those vidoes very helpful,
but something it's clear for me how does s3 handles the fencing tokens? doesnt the appliation requires another layer for solving this out?
S3 is a bit of a bad example because we don't own s3. But imagine you own whatever data source you're sinking to, you can just build this into the logic there.
In the linearization problem example, can't B do a read repair and only grabs the lock after the repair to mitigate it?
Sure, but how does B know that this is state it's supposed to read repair?
@@jordanhasnolife5163because B reads B:2 from one node, and B:1 from the other, so it knows to ‘fix’ the other node with B:2 (assuming quorum).
@@Keira77L-t3b In this particular case yes, however there are unfortunately cases in leaderless replication where for example in N:3, R:2, W:2 with 3 clients writing that they can all achieve quorum and all 3 nodes achieve different values.
See my most recent dynamo video.
In the Distributed Conses, we had 5 nodes(1 leader, 4 followers). When leader went down, we chose the follower that was up to date. What if as soon the follower was chosen as a leader, it went down. Now we have 2 nodes with old data and 1 with new. What happens in this case?
The one with new data must become the leader. If it goes down, we can't proceed as we can no longer reach a a majority of nodes.
Hey Jordan , Could you please create a video on stock exchange system design, that would majorly focus on the users getting a notification on the stocks they have subscribed if the stock values go up or down based on some parameters in real time.
Thanks
Will do eventually!
I'm confused about something. You mentioned around 4:20 that client 1's lock will expire due to the garbage collection, but then around 5:00 you say that when client 1 comes back they "still have the lock." I thought it expired for them?
Client 1 just *thinks* it has the lock. It doesn't actually.
Thank you!
I'm kinda curious about the fencing token, since the destination write (in your video it's S3) it has to know/store the value of used fencing token so far, is that possible? Since I think that we might have to communicate with many 3rd parties which we do not have the "right" to do that check. How can we resolve it?
I'm not sure what you mean here. A fencing token is just a number that you can pass to an external service, as long as the external service allows it in their API. Then the external service will only accept writes in an increasing order of that number.
@@jordanhasnolife5163 " as long as the external service allows it in their API" - that's my point. I meant, we can't be sure that the external service we want to use will always support fencing token (or have similar thing), what should we do in that case?
@@nguyentrunghieu6200 Use a different external service or add a stateful proxy in front of it
Your intros are always weirdly funny :) Question: in your final design, is the queue also written to followers? i.e. If Leader were to go down, would followers know that B is waiting for A? How would the websocket be restored by A?
Yep, the queue is written to followers. If the leader goes down, the clients will notice it, and reach out to the other links of nodes in the zookeeper cluster to get the address of the new leader and connect there.
Awesome video.. one question..
what is the leader/follower nodes here: application server/ redis cache/ database?
Probably just a normal SQL db
How can anyone through the notes you have made during these videos? Is it present in any GitHub repository or somewhere else?
See channel description
in minute 2 of the problem, so dunno what to expect next, but isn’t S3 object immutable? why wouldn’t get corrupted? i get what you are trying to convey though
Sure, let's say Hadoop then rather than S3.
Excellent job Sir
You are doing awesome work
Well when you feel you are better you don’t have to prove it to anyone, just relax and see ahead and I am sure you will find a much better girl than her !!
Oh she's fine this was just a joke lol
Nice Video but i have one query. In case of distributed consensus, how reads are done for lock ?. If it is read from replica which is not upto date, it can lead to a problem. I have also watched your raft videos, which gives me impression that distributed consensus provides lineraliziability but not strong consistency in reading.
You read from the leader. This is slow, but it's still fault tolerant, because we have the ability to perform a fail over if the leader goes down.
if writing to s3 happens inside synchronised block, whats wrong in that? Garbage collection will have no role to play in this scenario.
Can you elaborate? I don't know what you mean here.
Garbage collection just happens when we write stuff to S3 and then it turns out we don't need to use it
Why this channel name man , you are giving life to so many. Rename the channel to Jordan gives life
That's not the only thing I give
what if you have a large amount of connections on the leader node? How do you deal with that situation?
I'm assuming this is for many different locks, you basically have to partition them across many zookeeper clusters.
@@jordanhasnolife5163 does that mean this leader + several followers just act like one zookeeper node. And for horizontal scale up we need more zookeeper nodes with sharding(and every node have their own leader and followers)?
@@潘雪松-f4g You are correct
No Flink and CDC used? Jordan, are you ok?
No someone said my videos were too unrealistic for interviews on reddit and now I'm in a deep state of depression
@@jordanhasnolife5163 I think that's true, but I don't necessarily think it's bad. Your videos aren't really structured in an interview style. You go into a lot of depth in your designs. It would be unrealistic to draw the sometimes huge designs your show at the end of your videos in the space of an interview - especially somewhere like Meta where you realistically only have 35 minutes of design time, but it's beneficial to see everything as inspiration for how you might deep dive in different areas. In a real interview you may only deep dive into a couple of the areas you show.
Is linearizable similar to causal consistency?
Causal consistency just implies that if a write B happened because a user first saw write A, we should never be able to read B without also having access to the A write
Linearizable databases are causally consistent, but not all causally consistent databases are linearizable.
@@jordanhasnolife5163 interesting, we learn something new everyday! Great video man
Hi Jordan, really appreciate the content, is it possible for you to share your ipad notes. It is difficult to follow and revise your content without the notes and making the entire notes while following the video is time consuming.
It would be really helpful if you could share your hand written notes from ipad (maybe it is not perfect but still a better reference than nothing) which we could keep as reference to follow your content. As we go through the video, we could add our own comments or notes on it to make it more clear. Please consider.
Planning on doing this in bulk after finishing my current series, this will be in the next 1-3 months.
What editor are you using for drawing? Do you also use any pen based device?
Apple pencil + oneNote
@@jordanhasnolife5163 thanks for your response 🫡
Make irrelevant question but how will things work when there are multiple resources to contend for. Eg one s3 file second s3 file maybe a customers sms channel. ?
How will distributed consesus work then
You use a separate lock for the other file.
@@jordanhasnolife5163 when the master node fails then backup node1 have latest version of lock 1 but node2 has latest version of lock2. Then who will be the leader?
I had to debate with myself if I left the video with 69 comments or if I added a comment to help your video with the algorithm...
congrats, whta introo
I rejected in System Design Round LoL I took it lightly and didn't prepare
Welcome
LE'S GO KNICKS!
The content is great, but I always come for the golden nugget in the description
I churn out nuggets in the description and on the toilet
Thanks for the great video. I need your help with one of my task. I will become your patreon if you help. In my current company I have received one task in which I have to execute queries in the order they were originally executed. I have a list of queries and their original start and end times. So to execute them again in the same order we need to build dependency graph. How we can build this dependency graph.
Query 1: start time 1 end time 3
Query 2 start time 2 and end time 5
Query 3 start time 4 end time.
Qry 2 can start after qry 1 has started. Query 3 can be started after 1 finished and 2 started
My implementation is not efficient as I am checking for every query all the query started before it and storing the dependencies in list
Doesn't really make much sense to me considering the start times and end times. But look up topological sorting. Make a graph of the dependency relationships, and run a topological sort. This will tell you when you can schedule a given task, at which point you can run a second job that looks at currently scheduled tasks and whether their start time has passed.
I don't have a patreon, send it to charity.
What is the source of truth for all this?
I'm not sure what you're asking - do you mean my sources?
@@jordanhasnolife5163 Thanks for replying! I meant where did you learn about all this? Is there a comprehensive resource or is this just result of your years of experience in tech?
@@sid4579 Well considering that I don't have many years of experience in tech, I'm going to say that I did not learn anything that way. I'm simply just aggregating any information that I can find across anywhere on the internet. If there was a comprehensive resource for it, I don't think I'd be making these videos in the first place, as I myself am attempting to be a comprehensive resource for it.
@@sid4579 Martin Kleppmann's book and RUclips videos cover a lot of this.