Adding some more clarity here: Q. we say that in cassandra all reads and writes go to one partition and we have very little support for distributed transactions; but at the same time cassandra uses quorum of nodes for read; is this because by all reads and writes going to one partition - we mean that they go to a single node after partition ? how does it make sense to use quorum of nodes to read then ? ChatGPT said: ChatGPT You’re right in noting that Apache Cassandra operates differently compared to traditional databases, especially regarding how it handles reads, writes, and consistency. Let me clarify how Cassandra’s architecture handles these operations and why quorum-based reads still make sense. Data Distribution and Partitioning Partitioning: In Cassandra, data is distributed across nodes using a partitioning scheme. Each piece of data is assigned to a partition based on its partition key. This partition key determines which node (or nodes) will store that data. Single Node Writes: When a write is performed, it is directed to a node responsible for that partition. This node is referred to as the "coordinator node." The coordinator node then writes the data to multiple replicas according to the replication factor. For example, if the replication factor is 3, the data is written to three different nodes. Single Node Reads: Reads are a bit more complex. The coordinator node handles the read request. Depending on the consistency level specified for the read operation, the coordinator node will contact a certain number of replicas to retrieve the data. This is where the quorum concept comes into play. Quorum Reads and Writes Quorum: A quorum is a majority of nodes in a given set. For example, in a replication factor of 3, a quorum would be 2 (since 3/2 = 1.5, rounded up to the nearest whole number). Consistency Levels: QUORUM Read: When a read request uses the QUORUM consistency level, the coordinator node will read the data from a quorum of replica nodes. This ensures that the data retrieved is consistent and up-to-date, as a majority of the replicas agree on the value. QUORUM Write: Similarly, for writes, if the QUORUM consistency level is specified, the write operation must be acknowledged by a quorum of replica nodes before it is considered successful. This helps ensure that the write is durable and consistent. How It All Fits Together Coordinator Role: The coordinator node plays a central role in both reads and writes. It handles the routing of the request and gathers responses from the appropriate replica nodes. Quorum-Based Reads: Even though all writes and reads for a partition go through a single coordinator node, the actual data is distributed across multiple replicas. By using a quorum of nodes for reads, Cassandra ensures that the data retrieved is consistent with the most recent write. The coordinator node verifies the data with a majority of replicas to ensure accuracy. Consistency vs. Availability: Cassandra prioritizes availability and partition tolerance (according to the CAP theorem), and consistency is tunable based on the consistency level. By using quorum-based approaches, Cassandra balances between strong consistency and high availability. In summary, while reads and writes are directed to specific partitions, the use of quorum levels for reads and writes involves multiple nodes to ensure data consistency and fault tolerance. The coordinator node handles the routing but relies on a majority of nodes to validate the read or write operations, thus maintaining consistency across the distributed system. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- So; strong consistency is made configurable by just reading from the coordinator node (in this case the size of the quorum is 1 - the coordinator node itself); but cassandra is configurable for leaderless replication; at the same time it's not happy with cross-partition transactions (2PC); as in all the relevant data for a partition should be kept in that partition itself.
A major selling point for MongoDB is that its server and its query engine/language are optimized for replication and sharding. Postgresql, in comparison, is a bit harder to scale horizontally or to replicate for resilience. However, the tooling available for SQL databases is orders of magnitude better and larger compared to both Mongodb and Cassandra. If you have to pick only one, Postgresql (or similar) is almost always the best choice. There's also Cockroachdb.
Cassandra's data integrity can be made better by configuring quorums. However, due to hinted handoff-> it still cannot provide any transactional/strong consistency guarantees
@2:20 Your information is wrong there. I think you wanted to say that there are Partition keys (at least one) and Cluster keys (0 or more). Together they form the Primary key. Please review the video as it is misinforming thousands of people.
Oops, typo on my part. If this were a bigger deal I'd revise the video, but I don't think anyone is losing their job due to using the wrong terminology on cassandra key names.
@7:36 you mentioned btree databases write it to disk first. However with WAL in combination of btress, do DBs still writes go to disk ? I was under impression that with WAL they first write to WAL and then asynchronously they flush from memory to btress on disk. Please correct me if i am wrong.
Nice video. I have a question. I read somewhere that "While transactions in MongoDB can span multiple documents within a shard, they cannot span multiple shards within a sharded cluster.". This limits us in comparing this to traditional SQL isn't it ?
You've got about the same amount as me then. I don't think you should need a ton of experience with a particular technology to know some high level details about it
Is there any downside of MongoDB's ACID guarantees on de-normalized data, i.e. are writes slower than MySQL? It seems almost too good to be true, esp. say if a write has many duplicated locations it needs to transactionally update Also I knowCassandra's commonly used for chat applications as you mentioned, but I'm having trouble understanding why it's strictly preferential to say a document store. Say you had cluster key = chatID, sortKey = timeStamp, and one more column chatMessage = message contents. Yes the chatMessages have good locality, but many of COS other benefits (encoding compression, dictionary compression) aren't really relevant on content as variable as chat messages
I think you just have a pretty large degree of writes with chat messages due to all of the fan-out, so LSM trees and leaderless replication may be preferred. Can't see off the top of my head how mongo writes compare to SQL ones, but I don't think the metrics that you'd find would account for data being especially denormalized - that would just mean you have to do *more* writes
Why do the nodes share the hash ring configuration via a gossip protocol instead of using something like ZooKeeper? Is there some kind of benefit to using a Gossip protocol instead? Or am I misunderstanding this.
In the Dynamo paper they just wanted to design for every node to have homogenous functionality in the system and not rely on anything external. You could have used ZK.
@@jordanhasnolife5163 I see that makes sense since it's a system in itself why take a dependency on another external system if it's not needed. Thanks for the background info and great work with the video.
Can you please explain at 3:06, how are Gossip protocol and Consistent Hashing connected? I understand that in gossip protocol basically we designate the start note and we propagate state updates to the rest of the nodes, but how does consistent hashing come into play?
0:56 "you would use a single leader replication but that would just be sharded to get higher write throughput" I thought being single leader means writing to the leader node only and sharding helps with read throughput?
Hey Jordan, let's say we have 5 partitions of Cassandra each storing some data. Is there a way to find out the oldest entry from those partitions , considering there are multiple keys for partition(like senderid, receiver I'd)
4:07 When you say on Cassandra all reads and writes should go to ONE partition. Does that mean after a consistent hash, a partition on one node? But I thought read is from a quorum? I guess I'm thinking one partition means a partition on a single node. That's not what a partition means?
Sorry in this case I mean each key belongs on one explicit data partition meaning that you should only have to hear from one node in the cluster to read/write it (depending on how you configure quorums).
I think the difference is that quorum reads and writes have more to do with getting the most up to date value returned from the db. Cross partition reads/writes are related to how you are reading/writing data after you have partitioned a table. For example if my transaction involves writing to two different partitions, maybe I write to a table in one partition and some secondary index in another, then Cassandra doesn’t really like that. On the other hand, I can still read from multiple writes/quorum writes to several nodes to ensure that I get the most up to date value when reading
Cassandra only used for chat application? or is there any other benefits? Because if read throughput is poor, then historical chat data retrieval would take a hit on overall latency isn't it ?
Unrelated to the video, but I'm surprised that you're still able to consistently make content for youtube. Figured you would actually have no life with the new job.
Amazing summary. 1. Can you do a similar video on dynamo db and compare it to Cassandra or other No sql db . Separately what’s the notepad software you are using .
Adding some more clarity here:
Q. we say that in cassandra all reads and writes go to one partition and we have very little support for distributed transactions; but at the same time cassandra uses quorum of nodes for read; is this because by all reads and writes going to one partition - we mean that they go to a single node after partition ? how does it make sense to use quorum of nodes to read then ?
ChatGPT said:
ChatGPT
You’re right in noting that Apache Cassandra operates differently compared to traditional databases, especially regarding how it handles reads, writes, and consistency. Let me clarify how Cassandra’s architecture handles these operations and why quorum-based reads still make sense.
Data Distribution and Partitioning
Partitioning: In Cassandra, data is distributed across nodes using a partitioning scheme. Each piece of data is assigned to a partition based on its partition key. This partition key determines which node (or nodes) will store that data.
Single Node Writes: When a write is performed, it is directed to a node responsible for that partition. This node is referred to as the "coordinator node." The coordinator node then writes the data to multiple replicas according to the replication factor. For example, if the replication factor is 3, the data is written to three different nodes.
Single Node Reads: Reads are a bit more complex. The coordinator node handles the read request. Depending on the consistency level specified for the read operation, the coordinator node will contact a certain number of replicas to retrieve the data. This is where the quorum concept comes into play.
Quorum Reads and Writes
Quorum: A quorum is a majority of nodes in a given set. For example, in a replication factor of 3, a quorum would be 2 (since 3/2 = 1.5, rounded up to the nearest whole number).
Consistency Levels:
QUORUM Read: When a read request uses the QUORUM consistency level, the coordinator node will read the data from a quorum of replica nodes. This ensures that the data retrieved is consistent and up-to-date, as a majority of the replicas agree on the value.
QUORUM Write: Similarly, for writes, if the QUORUM consistency level is specified, the write operation must be acknowledged by a quorum of replica nodes before it is considered successful. This helps ensure that the write is durable and consistent.
How It All Fits Together
Coordinator Role: The coordinator node plays a central role in both reads and writes. It handles the routing of the request and gathers responses from the appropriate replica nodes.
Quorum-Based Reads: Even though all writes and reads for a partition go through a single coordinator node, the actual data is distributed across multiple replicas. By using a quorum of nodes for reads, Cassandra ensures that the data retrieved is consistent with the most recent write. The coordinator node verifies the data with a majority of replicas to ensure accuracy.
Consistency vs. Availability: Cassandra prioritizes availability and partition tolerance (according to the CAP theorem), and consistency is tunable based on the consistency level. By using quorum-based approaches, Cassandra balances between strong consistency and high availability.
In summary, while reads and writes are directed to specific partitions, the use of quorum levels for reads and writes involves multiple nodes to ensure data consistency and fault tolerance. The coordinator node handles the routing but relies on a majority of nodes to validate the read or write operations, thus maintaining consistency across the distributed system.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
So; strong consistency is made configurable by just reading from the coordinator node (in this case the size of the quorum is 1 - the coordinator node itself); but cassandra is configurable for leaderless replication; at the same time it's not happy with cross-partition transactions (2PC); as in all the relevant data for a partition should be kept in that partition itself.
I hope this is correct lol
A major selling point for MongoDB is that its server and its query engine/language are optimized for replication and sharding. Postgresql, in comparison, is a bit harder to scale horizontally or to replicate for resilience. However, the tooling available for SQL databases is orders of magnitude better and larger compared to both Mongodb and Cassandra. If you have to pick only one, Postgresql (or similar) is almost always the best choice. There's also Cockroachdb.
We thank you for having no life and making these videos 🙂. Loved this one too.
Cassandra's data integrity can be made better by configuring quorums.
However, due to hinted handoff-> it still cannot provide any transactional/strong consistency guarantees
True, thanks!
@2:20 Your information is wrong there. I think you wanted to say that there are Partition keys (at least one) and Cluster keys (0 or more). Together they form the Primary key. Please review the video as it is misinforming thousands of people.
Oops, typo on my part. If this were a bigger deal I'd revise the video, but I don't think anyone is losing their job due to using the wrong terminology on cassandra key names.
@7:36 you mentioned btree databases write it to disk first. However with WAL in combination of btress, do DBs still writes go to disk ? I was under impression that with WAL they first write to WAL and then asynchronously they flush from memory to btress on disk. Please correct me if i am wrong.
I imagine this one is implementation specific but I do think that some B-tree pages get cached in memory so in this case it's certainly possible!
Nice video.
I have a question. I read somewhere that "While transactions in MongoDB can span multiple documents within a shard, they cannot span multiple shards within a sharded cluster.". This limits us in comparing this to traditional SQL isn't it ?
Interesting - if that's the case then I'd agree, I suppose you could always build two phase commit on top of it yourself though
Considering I don't have prev experience in Casandra, how confidently I'll be able to talk about this in Interview and clear system design round 😢😢
You've got about the same amount as me then. I don't think you should need a ton of experience with a particular technology to know some high level details about it
@@jordanhasnolife5163 This gives me a lot of confidence..Thank you so much..Love all your videos
Is there any downside of MongoDB's ACID guarantees on de-normalized data, i.e. are writes slower than MySQL? It seems almost too good to be true, esp. say if a write has many duplicated locations it needs to transactionally update
Also I knowCassandra's commonly used for chat applications as you mentioned, but I'm having trouble understanding why it's strictly preferential to say a document store. Say you had cluster key = chatID, sortKey = timeStamp, and one more column chatMessage = message contents. Yes the chatMessages have good locality, but many of COS other benefits (encoding compression, dictionary compression) aren't really relevant on content as variable as chat messages
I think you just have a pretty large degree of writes with chat messages due to all of the fan-out, so LSM trees and leaderless replication may be preferred.
Can't see off the top of my head how mongo writes compare to SQL ones, but I don't think the metrics that you'd find would account for data being especially denormalized - that would just mean you have to do *more* writes
Why do the nodes share the hash ring configuration via a gossip protocol instead of using something like ZooKeeper? Is there some kind of benefit to using a Gossip protocol instead? Or am I misunderstanding this.
In the Dynamo paper they just wanted to design for every node to have homogenous functionality in the system and not rely on anything external. You could have used ZK.
@@jordanhasnolife5163 I see that makes sense since it's a system in itself why take a dependency on another external system if it's not needed. Thanks for the background info and great work with the video.
Can you please explain at 3:06, how are Gossip protocol and Consistent Hashing connected? I understand that in gossip protocol basically we designate the start note and we propagate state updates to the rest of the nodes, but how does consistent hashing come into play?
Gossip protocol is responsible for making sure all nodes see consistent hashing ring in up to date form
You mention that btree writes do originally go to disk. That's wrong. PostgreSQL writes go to btree index pages which reside in memory.
That's a good point and something I ought to caveat with more frequently
@@jordanhasnolife5163 Thanks
0:56 "you would use a single leader replication but that would just be sharded to get higher write throughput" I thought being single leader means writing to the leader node only and sharding helps with read throughput?
Oh, is it that different partition have a different leader?
Yes
Hey Jordan, let's say we have 5 partitions of Cassandra each storing some data. Is there a way to find out the oldest entry from those partitions , considering there are multiple keys for partition(like senderid, receiver I'd)
Unless you want to use something like a version vector, your easiest bet is probably just a timestamp lol
4:07 When you say on Cassandra all reads and writes should go to ONE partition. Does that mean after a consistent hash, a partition on one node? But I thought read is from a quorum? I guess I'm thinking one partition means a partition on a single node. That's not what a partition means?
Sorry in this case I mean each key belongs on one explicit data partition meaning that you should only have to hear from one node in the cluster to read/write it (depending on how you configure quorums).
I think the difference is that quorum reads and writes have more to do with getting the most up to date value returned from the db. Cross partition reads/writes are related to how you are reading/writing data after you have partitioned a table.
For example if my transaction involves writing to two different partitions, maybe I write to a table in one partition and some secondary index in another, then Cassandra doesn’t really like that.
On the other hand, I can still read from multiple writes/quorum writes to several nodes to ensure that I get the most up to date value when reading
Cassandra only used for chat application? or is there any other benefits? Because if read throughput is poor, then historical chat data retrieval would take a hit on overall latency isn't it ?
Read throughput is still good if you only read from one partition and take advantage of the sort key
Unrelated to the video, but I'm surprised that you're still able to consistently make content for youtube. Figured you would actually have no life with the new job.
Ah I basically do lol, to be honest if I was lifting less I'd have a decent amount of free time
If Mongodb is providing facilities as MySQL, then why everyone is not using it?
Because everyone is used to the relational data model and there's not really any reason to switch.
Web scale
Amazing summary.
1. Can you do a similar video on dynamo db and compare it to Cassandra or other No sql db .
Separately what’s the notepad software you are using .
Microsoft one note - to be honest DynamoDB is closed source so I don't know the exact design lol
Cluster key and sort key are the same thing. You meant to say partition key and clustering key.
Derp, nice catch