Do read "Designing Data-Intensive Applications" by Martin Kelppmann and "Database Internals" by Alex Petrov for more in depth analysis and they have references for original papers. All the best .. keep learning
It's not clear how index table can be sorted when one key can be found in multiple s.stable and compaction processes yet to run. Again confusion between searching index 1st or Bloomfilter on sstable.. I would say there is lots of missing pieces in the algo
For each SSTable, there is a separate, dedicated sparse index. We search a key from latest SSTable to the oldest SSTable, one by one. It doesn't matter if SSTable overlaps, as the last written key will be read first.
May I ask these data structures are to store the actual raw data themselves in the database? They are not indexing structures? Because I thought these are for indexing strategies? thank you for the video!
The problem is loading the 10kb sections inside each of those 20 segment files is costly. Looking up a key within that sorted section is fast, however, like you mention. But the operation that takes time is disk I/O.
With many distinct keys, the thread which is running in the background to perform compaction, it will affect the performance of read, one of the limitations of LSM. And the same key we have to write a lot of times. First in the memtable, then in segment and again while merging segments.
Hi do these competitions for working professionals actually help u in job switch? theres one on D2C- Porters hiring challenge.. theyll give winners package of 29 LPA Is it worth applying for it? If yes, plz make a video for guidance
Hi rachit.Nice video and explanation.But one doubt s i have read from DDSA book that data segments in LSM structure are of variable size ( not always fixed as mentioned as 10 kb).where as Btree only page oriented structures are of fixed size.( each page is of fixed size).Please correct me if i am wrong
Sir I am 2018 electrical engineering passout. I want to start a career in IT. I studied C++ in college. Where do you think should I start? I know coding is absolutely necessary,
thanks for this video.. can you please make one on flipkart runway competition?? its exclusively for females and will give internship opportunities to the winners.. I am in my final year and really want to try my best for this competition.. plz guide
Hello thanks for this amazing explanation. I am confused when does a sparse indexing and normal indexing used in segments. Like in your example we have 2 segments and both are sorted and the hashing may overlap in such case sparsing will not help
Do read "Designing Data-Intensive Applications" by Martin Kelppmann and "Database Internals" by Alex Petrov for more in depth analysis and they have references for original papers. All the best .. keep learning
It's not clear how index table can be sorted when one key can be found in multiple s.stable and compaction processes yet to run. Again confusion between searching index 1st or Bloomfilter on sstable.. I would say there is lots of missing pieces in the algo
There’s a sparse index table per SortedSegment.
In seach you check Bloomfilter first, if it says NO, move to next Segment and ask its Bloomfilter.
For each SSTable, there is a separate, dedicated sparse index. We search a key from latest SSTable to the oldest SSTable, one by one. It doesn't matter if SSTable overlaps, as the last written key will be read first.
May I ask these data structures are to store the actual raw data themselves in the database? They are not indexing structures? Because I thought these are for indexing strategies? thank you for the video!
IF bloom filter is present per SSt then we would still need to search for all SSTs ?
@Rachit Jain: What about key deletion?
Please explain rocksDB
I read about this in DDIA. you explained it beautifully. Thanks
14:47, you wouldn't need to search through each of the 20 SSTs since they are sorted, right?
He is talking about the 20 individual SSTs which are ordered by TimeStamp
The problem is loading the 10kb sections inside each of those 20 segment files is costly. Looking up a key within that sorted section is fast, however, like you mention. But the operation that takes time is disk I/O.
I am amazed to see, how you simply explain such a tough topic😍
Why latest segment is read first for read operation ? - Because there could be updates over the same keys and we want to read the latest value.
Bow down to u sir 🙇🙇 Havent seen so detailed video like this ❤️ Loved it . 🙏🙏
With many distinct keys, the thread which is running in the background to perform compaction, it will affect the performance of read, one of the limitations of LSM.
And the same key we have to write a lot of times. First in the memtable, then in segment and again while merging segments.
Looks like the sorting key in SSTable is the string whereas in Memtable it is the numbers. Might confuse some of the people
Sir, it's amazing thanks you very much for your valuable time
Hi do these competitions for working professionals actually help u in job switch? theres one on D2C- Porters hiring challenge.. theyll give winners package of 29 LPA
Is it worth applying for it? If yes, plz make a video for guidance
Hi rachit.Nice video and explanation.But one doubt s i have read from DDSA book that data segments in LSM structure are of variable size ( not always fixed as mentioned as 10 kb).where as Btree only page oriented structures are of fixed size.( each page is of fixed size).Please correct me if i am wrong
Sir I am 2018 electrical engineering passout. I want to start a career in IT. I studied C++ in college. Where do you think should I start? I know coding is absolutely necessary,
thanks for this video.. can you please make one on flipkart runway competition?? its exclusively for females and will give internship opportunities to the winners.. I am in my final year and really want to try my best for this competition.. plz guide
Hello thanks for this amazing explanation. I am confused when does a sparse indexing and normal indexing used in segments. Like in your example we have 2 segments and both are sorted and the hashing may overlap in such case sparsing will not help
Thanks you sir, it's really helpful for me❤️
Very nice video, can you please share the slides you used in the above video?
Very nicely explained! Looking forward to more of such system /backend videos.
Great job bro, how can I get these notes you wrote in the video
Thanks for clear explanations... good job
Great explanation, thank you!
Hi Rachit, we are finding these videos quiet helpful. It would be even better if you could share these notes 😃
database internals by Alex petrov
Very well explained :)
Good content
Awesome ❤️