IMPORTANT: Don't forget the Memtable is stored in memory, so if the system crashes, that data will be lost. To avoid losing data, we can maintain a separate log file on disk. Every time we write to the Memtable, we'll also append that write to the log file (no need to sort it as we just use it to restore after a crash). Then once the Memtable contents get written out to a SSTable file, we can erase the log file. That way the log helps us avoid losing writes stuck in memory when a crash happens.
LSM trees are actually used a lot in modern SQL databases as well. the idea is to represent a relational table as a series of packed records - [k: primary key v: field 1, field 2]. Indices can be created on other fields by creating additional k:v pairs - [k: field 1 v: primary key] so an index scan just becomes two NoSQL lookups for each result.
Spot on. One thing to keep in mind is that all databases, SQL or noSQL, are based on the principles of key value operations and the transaction consistency algorithms that allow for data integrity. The core calculus and proofs behind these systems are all done in the perspective of only two types of operation: read and write, each to some concept of a key. These algorithms vary wildly and come with varied pros and cons, some with very weird limitations that actually DO prevent SQL semantics (the FaunaDB and FoundationDB distributed algorithms require the W/R set to be known before starting the transaction, and SQL requires conditional and unbounded W/R sets to fully abide by the spec).
@@greyowl3787 write/read. Basically, Fauna and Foundation use a distributed transaction system that can only provide full ACID guarantees with exact read and write keys known before the transaction even starts. This means dependent querying is usually not possible, like "SELECT name IN users WHERE id = (SELECT friend IN users WHERE name = x);" simply because the result of the inner query changes the write set of the outer query based on its result. In some cases a good query engine can narrow down to a fixed "maybe W/R" set if the results are well bounded (either single element or known countable range), but generally they don't bother with SQL support for this reason.
That's the best system design content I have ever seen on youtube ! This channel is absolutely amazing. It must be tough to squeeze all that valuable knowledge into less than a 10 minutes video. Keep up the excellent work!
This is the most information-dense video on what's so special about NoSQL databases I ever saw. Not a word was superfluous, but the key concepts were clearly transmitted.
VERY informative without going into too much complexity. THANKS and congrats for a great video. I'm an MS SQL Server DBA, and the high level explanation you provided was awesome. Thanks again. Best, R.
That's a cool trick. So what it is essentially doing is spreading out the computer's resources across time, where with a traditional database you will get lots of spikes in processing on the timeline when doing reads and writes. Mind you the attraction of SQL was that you could have multiple indexes and create custom views on the data in a highly relational way, but granted that was very expensive on resources. I think traditional databases were mostly optimised for the deficiencies of the mechanical hard disk drive. I think it may end up as redundant in future as we store our data more on chips. Thanks for the video. I like videos that don't waste your time with long BS intros.
You read my mind when it comes to "spreading the computer's resources across time"... Which for some use cases makes perfect sense, but not all use cases. Not sure I agree with the "deficiencies of mechanical HDDs" part of your comment though. But great comment overall.
I never experienced such a communicative video with such a simple and easy explanation.. Thanks Alex ..Please keep it up and upload more such videos. I have anyway bought both volume of System Design Book. Thank you so much !!!
Great explanation. Resolved all my doubts on how NoSql DBs work. However, I wanted to understand 1) Whether the balanced tree and keys in sorted set is only the object key with pointer to data value or it also contains the actual data? 2) Can a NoSql DB index multiple keys? 3) Why can't SQL DB also implement flushing mechanism in order to speed up writes? I know that they are highly consistent so they need to persist data to disk, but they can simply append the entry in a log file just like NoSql DBs do, and in case of a network partition, first check the log file to sync data in actual database?
I've often wondered how NOSQL databases can achieve higher write throughput than relational DB. Thank you for sharing the techniques involved. Your explanation is clear and the graphics are excellent!
Nicely explained. I read all this information in designing data intensive application book. But few topics like memorable and sstables were still bit unclear to me. Got the whole idea now. Great stuff. Keep going
Great work! I wish I had watched this video before trying to learn the architecture behind RocksDB by only looking their documentation, hahaha! Awesome work!
Like many have said, this is a great dense video at a good beginner depth for people like me. I hope there will be a relational db video that complements this if at all possible, since the quality level is so high 🤞
One additional thing to mention (around ruclips.net/video/I6jB0nM9SKU/видео.html perhaps) is that the writes are written to memory AND a transaction log to ensure durability. Otherwise whatever was in the MemTable will not persist after a crash. The transaction log can be replayed to rebuild the MemTable.
Thank you for the feedback. We did have that initially, but decided to take it out to focus on the LSM tree itself. We knew someone would bring it up, but didn't think it would take this long. 😂 We are glad you did.
Thanks for the video. Your explanation rises a concern to me : since the memtable is in memory, what happens if the server crashes before flushing ? Is that memtable distributed or replicated ?
Generally for the memory table write ahead log is maintained. Once the memory table is moved to create SSDTable write ahead log is deleted. In case of crashes write ahead log can be used to restore the memory table.
I think the movie is recommended to left 30 seconds instead of 10s for the ending credit because the next suggestion video covers (in the left bottom side) in the summary time
Question regarding the books: volume 1 and volume 2 are considered as two separate editions for the same content or the same edition for different contents? Another way to ask: Should I but both volumes or just volume 2?
Object ID, unless you design the system to use a high precision timestamp as the id, which could maybe be an interesting idea. If the Object ID is a timestamp, then object creations are already sorted which could further boost write performance, though there is probably no advantage in the case of updates or deletes.
What would be a good example of an application benefiting from "fast write slow read" property of NoSQL DB? Based on what's presented, I'd say most user-facing application, like a typical service a startup would build (e.g. personal calendar organization, etc) doesn't sound like a good fit given reads are pretty important in user-facing traffic.
Very cool. Could have - one process just taking DB writes and putting them in memory - another writes too-big variables to files on disk - next would go through files and flatten them (like continuous truncate/shrink on a transaction log) - last would take DB reads and go through the memory, bloom filters, and file structure to find and return the requested data.
How is a key-value actually stored and retrieved from disk? Although a SSTable is sorted within itself, since the keys in different SSTables are not related, how does a read request find a key? Surely not by trying all SSTables one by one, that'd be too slow.
this is brilliant! I'm also wondering would this amount/level of knowledge for an advanced DS is enough for tech/system design interview? Obviously I think the interviewer won't ask for implementation so... ig im trying to know how deeper do we need to go than e.g. this channel's few minutes videos?
Thanks! One of the things I find difficult to find good information on is structuring data. Given I want these x properties, how do so arrange the information to get them, and what technologies are required to do it.
The explanation of why SStables have to be sorted is not sufficient. In my opinion, the reason is that allowing the SSTables to be sorted allows faster merging and allows the index for the table to be sparse. It doesn't really allow faster table retrieval since running a binary search on disk is not so efficient.
6:56 correction regarding Bloom Filter If the Key doesn't exist, bloom filter will NOT return False Negative If the Key doesn't exist, bloom filter MAY return False Positive In other words. If bloom filter says there's NO key in page, there's definitely no key in page. If it says that there's the key in page, there MIGHT be the key in page with some probability. This probability data structure helps to reduce unnecessary reads.
Cassandra initially had size tiered only and later borrowed leveled from RocksDB to solve the space amplification problem, so it's not completely misleading.
IMPORTANT: Don't forget the Memtable is stored in memory, so if the system crashes, that data will be lost. To avoid losing data, we can maintain a separate log file on disk. Every time we write to the Memtable, we'll also append that write to the log file (no need to sort it as we just use it to restore after a crash). Then once the Memtable contents get written out to a SSTable file, we can erase the log file. That way the log helps us avoid losing writes stuck in memory when a crash happens.
This seems like a fairly important detail.
Kafka is used as the logging system by default in most scenarios
a write-ahead log.
This is the perfect amount of depth and overview I’m looking for. Great videos and visuals!!
Yes, awesome video.
+1
LSM trees are actually used a lot in modern SQL databases as well. the idea is to represent a relational table as a series of packed records - [k: primary key v: field 1, field 2]. Indices can be created on other fields by creating additional k:v pairs - [k: field 1 v: primary key] so an index scan just becomes two NoSQL lookups for each result.
Spot on. One thing to keep in mind is that all databases, SQL or noSQL, are based on the principles of key value operations and the transaction consistency algorithms that allow for data integrity. The core calculus and proofs behind these systems are all done in the perspective of only two types of operation: read and write, each to some concept of a key.
These algorithms vary wildly and come with varied pros and cons, some with very weird limitations that actually DO prevent SQL semantics (the FaunaDB and FoundationDB distributed algorithms require the W/R set to be known before starting the transaction, and SQL requires conditional and unbounded W/R sets to fully abide by the spec).
@@romannasuti25 what do you mean by “W/R set” ?
@@greyowl3787 write/read. Basically, Fauna and Foundation use a distributed transaction system that can only provide full ACID guarantees with exact read and write keys known before the transaction even starts. This means dependent querying is usually not possible, like "SELECT name IN users WHERE id = (SELECT friend IN users WHERE name = x);" simply because the result of the inner query changes the write set of the outer query based on its result. In some cases a good query engine can narrow down to a fixed "maybe W/R" set if the results are well bounded (either single element or known countable range), but generally they don't bother with SQL support for this reason.
@@greyowl3787 write/read i assume
Bruh, this is just awesome... keep going
This guy is better than my teamlead in term of explaining a concept on NoSQL, thank you , make my day
I appreciate your efforts. Thanks for making system design more palatable than ever.
That's the best system design content I have ever seen on youtube ! This channel is absolutely amazing. It must be tough to squeeze all that valuable knowledge into less than a 10 minutes video. Keep up the excellent work!
This is the most information-dense video on what's so special about NoSQL databases I ever saw. Not a word was superfluous, but the key concepts were clearly transmitted.
One of the concise video to understand how elastic/lucene is using these things to fast write and read.
Great work man!
I can't recall the last time I stumbled upon such great material! Fascinating work!
This channel is by far the best channel that I've found on yt about tech
VERY informative without going into too much complexity. THANKS and congrats for a great video. I'm an MS SQL Server DBA, and the high level explanation you provided was awesome. Thanks again. Best, R.
Really awesome overview of how SQL and NoSQL differ. Agree with Joel below me just the right level of detail to provide value.
That's a cool trick. So what it is essentially doing is spreading out the computer's resources across time, where with a traditional database you will get lots of spikes in processing on the timeline when doing reads and writes. Mind you the attraction of SQL was that you could have multiple indexes and create custom views on the data in a highly relational way, but granted that was very expensive on resources. I think traditional databases were mostly optimised for the deficiencies of the mechanical hard disk drive. I think it may end up as redundant in future as we store our data more on chips.
Thanks for the video. I like videos that don't waste your time with long BS intros.
You read my mind when it comes to "spreading the computer's resources across time"... Which for some use cases makes perfect sense, but not all use cases.
Not sure I agree with the "deficiencies of mechanical HDDs" part of your comment though. But great comment overall.
The dark mode was awesome. I think it could be the default from now on
I came here to say this! +1 for dark more, finally!
These videos are superb man, keep up the good work
I love these byte sized video, helps introduce new concepts.
This channel is such a gold mine!
I never experienced such a communicative video with such a simple and easy explanation.. Thanks Alex ..Please keep it up and upload more such videos. I have anyway bought both volume of System Design Book. Thank you so much !!!
besides bloom filter, a sparse index is to help find a key quickly so we only look into a small number of sstables
These videos are incredible! Very well paced and presented!
Great explanation. Resolved all my doubts on how NoSql DBs work. However, I wanted to understand
1) Whether the balanced tree and keys in sorted set is only the object key with pointer to data value or it also contains the actual data?
2) Can a NoSql DB index multiple keys?
3) Why can't SQL DB also implement flushing mechanism in order to speed up writes? I know that they are highly consistent so they need to persist data to disk, but they can simply append the entry in a log file just like NoSql DBs do, and in case of a network partition, first check the log file to sync data in actual database?
Re 3 afaik that's what they already do. But sooner or later it has to write them to b-tree, which I guess is the real bottleneck.
LOVE THE DARK BACKGROUND! :D:D:D:D (Also the video of course!!)
great, you make it sound so simple, I'm writing my own nosql db this weekend ;)
did you start writing
well, did you?
I've often wondered how NOSQL databases can achieve higher write throughput than relational DB. Thank you for sharing the techniques involved. Your explanation is clear and the graphics are excellent!
Stuff like this is the future of many forms of education.
Nicely explained. I read all this information in designing data intensive application book. But few topics like memorable and sstables were still bit unclear to me. Got the whole idea now. Great stuff. Keep going
Good video. For the further topic development it'd be interesting to see a LSM tree vs Redis RDM/AOF persistence schema comparison.
Great explanation, thank you, this helped reinforce my learnings about LSM-trees from reading. The graphics were especially helpful
Great video. I'm a junior dev that has to implement a chat feature in the next few weeks. This really helped me understand NoSQL. Thanks.
Great visualization of "Designing Data-Intensive Applications" book
Great work! I wish I had watched this video before trying to learn the architecture behind RocksDB by only looking their documentation, hahaha! Awesome work!
Thank you. great video, as always.
I'd like if you guys could go into more detail
Sublime, superb, excellent ... again.
Alex - Your lectures and contents are great assets to software engineering community.
Not at all a question I had, but glad I watched because I extensively use Mongo for my own app
great explanation for beginners
Another amazing video! This format is so vaiuable.
I can't wait for my next systems design interview!
Like many have said, this is a great dense video at a good beginner depth for people like me. I hope there will be a relational db video that complements this if at all possible, since the quality level is so high 🤞
One additional thing to mention (around ruclips.net/video/I6jB0nM9SKU/видео.html perhaps) is that the writes are written to memory AND a transaction log to ensure durability. Otherwise whatever was in the MemTable will not persist after a crash. The transaction log can be replayed to rebuild the MemTable.
Thank you for the feedback.
We did have that initially, but decided to take it out to focus on the LSM tree itself.
We knew someone would bring it up, but didn't think it would take this long. 😂
We are glad you did.
I was actually wondering if this were a difference with what I understand of relational dbs. Thanks for pointing it out.
I really appreciated these videos !! Thanks you very much
I would to know what software are you using to produce such presentation 🙏
Great Content, thank you very much.
I'm already waiting for Tuesday for a new video, as much as I wait for a new One Piece episode.
amazing content, thank you for all the hard work you do!
in their book, bloom filters are stored on disk, but here they're shown to be in memory. Hopefully we'll get some eventual consistency
Perhaps the best educational systems content on the whole of RUclips right now. Great stuff
Please create a video on size tiered and level compaction strategy as well
This is so high quality. Thank you :)
Excellent presentation!
Man this video is awesome, so much information loved it!
LSM Tree에 대한 이해와 작동 방식에 대한 개요를 알 수 있었습니다. ㄳ 합니다
one of the best videos so far
Absolutely amazing. Thank you for making the video!
This is an awesome overview of LSM tree. If someone wants dig deeper read "Design Data intensive applications".
It is an awesome book.
Looking at how beautifuly my tired brain understood this, this video deserves a noble prize.
The next step after watching this video is to … watch it again 😂. Awesome as always.
perfect explanation, thanks
Thanks for the video. Your explanation rises a concern to me : since the memtable is in memory, what happens if the server crashes before flushing ?
Is that memtable distributed or replicated ?
Cassandra, for example, still has a write ahead log.
Generally for the memory table write ahead log is maintained. Once the memory table is moved to create SSDTable write ahead log is deleted. In case of crashes write ahead log can be used to restore the memory table.
very well explained. Thanks
I think the movie is recommended to left 30 seconds instead of 10s for the ending credit because the next suggestion video covers (in the left bottom side) in the summary time
Very interesting! Loved this video
Question regarding the books: volume 1 and volume 2 are considered as two separate editions for the same content or the same edition for different contents? Another way to ask: Should I but both volumes or just volume 2?
Thank you very much.
Great video!
Amazing video
1:15 what is that "object key" ? The row key that is being edited/added to the keyspace ?
amazing video!
Thank you! When you say sorted, is it by some object id or time?
Object ID, unless you design the system to use a high precision timestamp as the id, which could maybe be an interesting idea. If the Object ID is a timestamp, then object creations are already sorted which could further boost write performance, though there is probably no advantage in the case of updates or deletes.
This is a great video. If I want ti go further is there any good reference for nosql?
What would be a good example of an application benefiting from "fast write slow read" property of NoSQL DB? Based on what's presented, I'd say most user-facing application, like a typical service a startup would build (e.g. personal calendar organization, etc) doesn't sound like a good fit given reads are pretty important in user-facing traffic.
Freaking intuitive talk.
Thank you for the great video. Keep up the good work. ❤
What software do you use for animation in your videos?
I am very curious
I want to know as well. It is perfect!!!
@@wrondonparticual5113 Maybe this is something :-) ruclips.net/video/H5GETOP7ivs/видео.html
Awesome work, thanks for this!
Very cool. Could have
- one process just taking DB writes and putting them in memory
- another writes too-big variables to files on disk
- next would go through files and flatten them (like continuous truncate/shrink on a transaction log)
- last would take DB reads and go through the memory, bloom filters, and file structure to find and return the requested data.
How is a key-value actually stored and retrieved from disk? Although a SSTable is sorted within itself, since the keys in different SSTables are not related, how does a read request find a key? Surely not by trying all SSTables one by one, that'd be too slow.
Awesome overview.
Any idea how Domino databases work in this regard? Do they follow the same concepts? Is the compaction process "Tiering" or "Leveling"?
awesome explanation
I don't really understand them, but it's cool!
So much knowledge!!!
great explanation ❤
great explanation!
hi , can you do a video on how Splunk is use for devops and how it storing its data ?
Hi, i just wonder why the second volume has not been available in kindle yet :( i'm from Vietnam and shipping is expensive
Actually, many modern SQL are also LSM Tree based, it's not limited to no-sql
Is the concept of SS tables specific to a specific type of NOSQL database(Cassandra), or generic ?
Amazing Content! Does anyone know how these animations are made?
awesome video indeed! thank you
this is brilliant! I'm also wondering would this amount/level of knowledge for an advanced DS is enough for tech/system design interview? Obviously I think the interviewer won't ask for implementation so... ig im trying to know how deeper do we need to go than e.g. this channel's few minutes videos?
When searching why does it need to search at all levels? It needs to search at the latest level right? Can someone help me understand this.
The keys, when compacted, are deleted from their original level, and added to the next level.
Thanks! One of the things I find difficult to find good information on is structuring data. Given I want these x properties, how do so arrange the information to get them, and what technologies are required to do it.
yeah I freakin love this channel
Very good animation
Thank you!
The explanation of why SStables have to be sorted is not sufficient. In my opinion, the reason is that allowing the SSTables to be sorted allows faster merging and allows the index for the table to be sparse. It doesn't really allow faster table retrieval since running a binary search on disk is not so efficient.
6:56 correction regarding Bloom Filter
If the Key doesn't exist, bloom filter will NOT return False Negative
If the Key doesn't exist, bloom filter MAY return False Positive
In other words.
If bloom filter says there's NO key in page, there's definitely no key in page.
If it says that there's the key in page, there MIGHT be the key in page with some probability.
This probability data structure helps to reduce unnecessary reads.
That's exactly what is said in the video
Cassandra also has Leveled Compaction Strategy so that slide comparing it to RocksDB is a little misleading
Cassandra initially had size tiered only and later borrowed leveled from RocksDB to solve the space amplification problem, so it's not completely misleading.
Thank you so much!
relational databases don't use btrees, they use b+ trees. the only db i know of that uses btrees is actually mongodb.
No, mongodb use b+ tree
What is the animation software used? It is beautiful