Wouldn't there be a significant risk of data mismatch in the tree? What if one tree has 762 and 127 and the other has 761 and 128? Wouldn't that lead to a condition where one database mistakenly thinks the other has data that it doesn't?
Multi leader and leader less replication looks almost same in the sense that we would be writing data to multiple nodes. My question is can we use crdt for replication in leader less system as well?
How do we have access to the other node's merkle tree, while doing the comparison? Would it be sent on network? Then it kind of defeats the purpose. Also the merkle tree can be huge if I understand correctly, kind of 2x the size of the DB. Isn't that a lot of data to communicate over the network from one node to another? May be I didn't understand correctly, please correct me if I'm wrong.
Hey Rahul - you send the root hash of the merkle tree over the network first (small), which tells you if you have differences, and then just traverse down the merkle tree where the hashes are not the same to see the differences in rows. Merkle tree is just a tree of hashes. I have a dedicated video on them on my channel from a couple of years ago, maybe worth checking that one out!
Jordan when you mention snapshot versioning right at the beginning are you talking about vector versioning? Or are you simply referring to how a db may store the previous write along the current write?
@@jordanhasnolife5163 Around 2:12. I think the main idea is that every write has a version number associated with it. In DDIA, they give an example on the leaderless replication chapter (pg 189) between two clients trying to write to a single replica. I think vector versioning (later on, in that same chapter) is this same concept but applied to multiple replicas. Does that make sense?
@@timothyh1965 so yeah in this case basically that number would be a timestamp assigned by whatever node "coordinates" the write. Cassandra can also do version vectors IIRC, but the example I'm showing here is simple "last write wins" resolution.
If both the databases have multiple mismatched writes, then would the complexity still remain O(log n)? It could be possible that both the left and right nodes are mismatched, in which case both sub-trees will need to be traversed, thus making the time complexity greater than log n
Man U R the GOAT, I was looking for companion lectures for DDIA and found your playlist
You genuinely have the best system design videos I’ve ever encountered, by a significant margin.
Thanks James!! That's all I'm going for!
Hey thanks Jordan you explained this really well.
I was expecting Sequence CRDTs next after seeing the previous video 😅. But this one is cool
Being insane is part of what makes up a good engineer
Insanity and laziness
Cool. This kind of tree is surprising. Thanks for sharing
Loved it, let's implement it on postgres make two databases and ...
Wouldn't there be a significant risk of data mismatch in the tree? What if one tree has 762 and 127 and the other has 761 and 128? Wouldn't that lead to a condition where one database mistakenly thinks the other has data that it doesn't?
Well I'd say that as long as the way that we create the tree/hashes is deterministic that two nodes with the same data should not have that issue
Hey Jordan,
when the merkel trees mismatch and we found the row that is mismatching, which row of the two dbs do we pick?
Whichever has the higher version vector or timestamp
Multi leader and leader less replication looks almost same in the sense that we would be writing data to multiple nodes. My question is can we use crdt for replication in leader less system as well?
I guess technically the merkle tree that we use in leaderless replication for anti entropy is kind of a type of CRDT
How do we have access to the other node's merkle tree, while doing the comparison?
Would it be sent on network? Then it kind of defeats the purpose.
Also the merkle tree can be huge if I understand correctly, kind of 2x the size of the DB.
Isn't that a lot of data to communicate over the network from one node to another?
May be I didn't understand correctly, please correct me if I'm wrong.
Hey Rahul - you send the root hash of the merkle tree over the network first (small), which tells you if you have differences, and then just traverse down the merkle tree where the hashes are not the same to see the differences in rows.
Merkle tree is just a tree of hashes. I have a dedicated video on them on my channel from a couple of years ago, maybe worth checking that one out!
Ahh understood how that helps, thanks a lot!
Will also checkout that video.
Jordan when you mention snapshot versioning right at the beginning are you talking about vector versioning? Or are you simply referring to how a db may store the previous write along the current write?
Would you mind giving me a timestamp
@@jordanhasnolife5163 Around 2:12. I think the main idea is that every write has a version number associated with it. In DDIA, they give an example on the leaderless replication chapter (pg 189) between two clients trying to write to a single replica.
I think vector versioning (later on, in that same chapter) is this same concept but applied to multiple replicas. Does that make sense?
@@timothyh1965 so yeah in this case basically that number would be a timestamp assigned by whatever node "coordinates" the write. Cassandra can also do version vectors IIRC, but the example I'm showing here is simple "last write wins" resolution.
@@jordanhasnolife5163 awesome. Thanks man love your videos
Could you pls do a video on Data Mesh technology? Thanks!
Yeah I've gotta research this a bit more, not sure how relevant it is to the systems design interview.
If both the databases have multiple mismatched writes, then would the complexity still remain O(log n)? It could be possible that both the left and right nodes are mismatched, in which case both sub-trees will need to be traversed, thus making the time complexity greater than log n
That's correct, in the worst case everything is different and now we have to linearly traverse the tree.
@@jordanhasnolife5163 Thank you! And I love your videos!!
Hey jordan, Is the video on sequence CRDT missing in the playlist, or not made?
It's in the google docs system design video
@@jordanhasnolife5163 Awesome, thanks.
Did you pass a systems design interview to get your current job ?
Nope! Ironic isn't it? Passed other systems design rounds for different quant firms during the interview process
Jordan, how are you learning these concepts? Do you read papers or are you referencing a book?
Both, in this video it's mostly based off DDIA
Learned about binary search trees in 2016
And i am learning about their real use cases in 2024 :O
And yes i'm old :/
Thanks Jordan!