I graduated from CMU without taking this course, which I have been most regretful of. Thanks for posting this course Andy! It's the best-ever learning source for database! Please ignore those malicious comments!
Thank you for this awesome video. Some updates about recent Postgres changes - In PG, we actually append the heap TID with the key. This helps us to resolve the ordering problem of duplicates, also it maintains the same physical order as in the heap pages. We truncate away the suffix key attributes including the heap TID during a leaf page split.
if b_tree page size is like 1 MB, greater than atomic disk write max page size 4kb , how is the consistency maintained during writes + crash ? probably might learn it in future class
Neat thing is that interpolation is also helpful for bisection search, not just for establishing the starting point for linear search (depending on the dataset, a linear search after a distribution based interpolation can still be expensive, so one can do multiple interpolation hops on ever smaller intervals)
Hi. During your presentation of b+trees you struggled with size of the web page. In Firefox you can press (ctrl + '+') to zoom in page and (ctrl + '-') to zoom out; ctrl + 0 will make page 100% zoom. In url bar you can see current zoom level (200%)
Since Andy asked, I actually do have a question about project 1. In the lecture, it is stated that ClockReplacer should check if a page has been referenced lately, but none of the methods in Replacer update the ref flag on its own. Could we just implement the clock using a linked list, where a newly unpinned page is placed immediately *behind* the clock hand so that it is considered as late as possible for victimization? Or would this not work because the ref flag can be updated somehow? Is it perhaps possible to unpin a page multiple times? I do see this case in the unit tests, but I don't understand why the buffer pool manager would ever unpin a page twice without pinning it in between.
At slide 14, you're saying that Postgresql only uses Record IDs in indexes, but aren't indeces with INCLUDES clauses implemented by embedding additional data into the index itself?
Hey, I have a question. In slide 36, he concluded that it is better to store the page address rather than the page no. so that we don't have to go to the Buffer Pool Manager so that we can directly access the page directly into memory. So the question is- 1. I think that the page allocation and de-allocation things are handled by the Buffer Pool manager but he is saying that we can directly use the pointer to the page instead of page numbers 2. If we are going to get the page directly allocated without the use of Buffer pool manager then it may happen that we have copies of the same page in the memory and that may cause concurrency problem on that page. because that page may be updated differently by different operations of DBMS. If anyone has any answer to the questions please let me know or like this comment to get other people noticed this question and we can have our answers.
I think maybe the pages which store the keys have higher privilege to be in the buffer pool than any other kinds of pages. Because the DBMS uses index to query the data in most cases.
He talks about it at the end of lecture. You ask buffer pool manager for the first time for the page. Once you get the page, you pin it. Now, buffer pool manager can't deallocate it. You can use the same Page* multiple times. This does not sound useful for leaf nodes. But this is useful for internal nodes at least for first 2 levels, which are always getting accessed and are hot. So, pin them as soon as db start getting request. Now, you never need to go to the buffer pool manager for Page* as they are not going to get evicted.
why people say he has hygiene issue??? back to the first video when he is in his tub... he looks so clean and has zero hygiene problem... why some people are twisted like this...
Prof. Andy Pavlo is the best prof. I never had, thanks for uploading these to YT
Rarely give feedback on videos -- but really really really well done and fun lectures -- thank you Andy!
0:27 RUclips feedback
1:17 SHA-256 is not reversible
2:04 Administrivia
2:50 Data structures
3:37 Table indexes
7:18 Table indexes trade-offs
8:24 Today's agenda: B+Tree, design decisions, optimizations
8:50 B-Tree family
11:27 B+Tree
13:26 B+Tree properties
15:00 B+Tree example
16:59 B+Tree node is an array of key/value pairs
17:43 B+Tree leaf nodes
20:25 Leaf node values: record IDs, tuple data
21:36 B-Tree vs B+Tree
25:24 B+Tree insert
27:10 B+Tree insert demo
33:07 B+Tree delete
35:22 B+Tree delete demo
36:35 How do you merge leaf nodes?
38:42 B+Trees in practice
39:45 Clustered indexes
41:57 Selection conditions
46:35 B+Tree design choices
47:25 Node size (the slower, the larger)
49:15 Merge threshold
50:45 Variable length keys
55:00 Key map / indirection
1:01:31 Non-unique indexes
1:02:04 Non-unique: duplicate keys (more common)
1:02:24 Non-unique: value lists
1:04:45 Intra-node search (linear, binary, interpolation)
1:07:06 Optimizations
1:07:18 Prefix compression
1:10:28 Suffix truncation
1:11:46 Bulk insert: 1) Sort keys 2) Build index from the bottom up
1:14:04 Pointer swizzling
1:16:43 B+Tree is awesome
beast
10:28 bleep
I graduated from CMU without taking this course, which I have been most regretful of. Thanks for posting this course Andy! It's the best-ever learning source for database! Please ignore those malicious comments!
Your handling of the RUclips comments at the beginning will make you legendary in my mind forever! That was just amazing.
Thank you prof for making avaiable this video to poor countries who dont have access to this kind of schools
Awesome content!
My mentor #hussein 🙌🙌 please join him up #bestBackend
Thanks for giving this lectures public available. I couldn't find any similar class in my country so what you are doing is really awesome :D
Best DB course I have ever seen! Learned a lot!
B-tree visualisation: www.cs.usfca.edu/~galles/visualization/BPlusTree.html
Fantastic lectures! Andy ,thank you for sharing your knowledge!
Thank you for this awesome video. Some updates about recent Postgres changes - In PG, we actually append the heap TID with the key. This helps us to resolve the ordering problem of duplicates, also it maintains the same physical order as in the heap pages. We truncate away the suffix key attributes including the heap TID during a leaf page split.
Thank you very much for this Prof. Andy Pavlo. Great learning...!!! Like your lecture & attitude...!!!
Thank you for uploading the videos and the lecture notes. Really, really helpful and useful to a lot of people.
They are jealous from the successful people, that's the reason behind posted these comment
Good job professor and keep going
Thank you so much prof. Andy, I really like your teaching style and I believe what you are doing is awesome. Thanks again :)
45:16 What if the rightmost leaf node contains (C, C), (C, D), (D, B)?
In this case, (*, B) will still have to traverse the rightmost leaf node.
Oh, a student later asked this question.
Hey, I found a paper that might be relevant 1:13:40 but it talks aboud B-tree: Online B-Tree Merging. Is it what Andy refer to in the video?
Great video, I've learned a lot. Thanks for uploading these videos!
if b_tree page size is like 1 MB, greater than atomic disk write max page size 4kb , how is the consistency maintained during writes + crash ? probably might learn it in future class
Neat thing is that interpolation is also helpful for bisection search, not just for establishing the starting point for linear search (depending on the dataset, a linear search after a distribution based interpolation can still be expensive, so one can do multiple interpolation hops on ever smaller intervals)
Hi. During your presentation of b+trees you struggled with size of the web page. In Firefox you can press (ctrl + '+') to zoom in page and (ctrl + '-') to zoom out; ctrl + 0 will make page 100% zoom. In url bar you can see current zoom level (200%)
in var length key ,how we gonna find the leaf node if we have a key ?! if we only store the adderss, i think i am missing something here
why does etcd uses b tree instead of b+ tree for its tree index?
Since Andy asked, I actually do have a question about project 1. In the lecture, it is stated that ClockReplacer should check if a page has been referenced lately, but none of the methods in Replacer update the ref flag on its own. Could we just implement the clock using a linked list, where a newly unpinned page is placed immediately *behind* the clock hand so that it is considered as late as possible for victimization? Or would this not work because the ref flag can be updated somehow? Is it perhaps possible to unpin a page multiple times? I do see this case in the unit tests, but I don't understand why the buffer pool manager would ever unpin a page twice without pinning it in between.
At slide 14, you're saying that Postgresql only uses Record IDs in indexes, but aren't indeces with INCLUDES clauses implemented by embedding additional data into the index itself?
Really learn so much~
11:34 Wikipedia mentions that B-tree authors never explained what the letter _B_ stands for :)
en.wikipedia.org/wiki/B-tree#Origin
i love this course!!!
Hey, I have a question. In slide 36, he concluded that it is better to store the page address rather than the page no. so that we don't have to go to the Buffer Pool Manager so that we can directly access the page directly into memory. So the question is-
1. I think that the page allocation and de-allocation things are handled by the Buffer Pool manager but he is saying that we can directly use the pointer to the page instead of page numbers
2. If we are going to get the page directly allocated without the use of Buffer pool manager then it may happen that we have copies of the same page in the memory and that may cause concurrency problem on that page. because that page may be updated differently by different operations of DBMS.
If anyone has any answer to the questions please let me know or like this comment to get other people noticed this question and we can have our answers.
I think maybe the pages which store the keys have higher privilege to be in the buffer pool than any other kinds of pages. Because the DBMS uses index to query the data in most cases.
He talks about it at the end of lecture.
You ask buffer pool manager for the first time for the page. Once you get the page, you pin it. Now, buffer pool manager can't deallocate it. You can use the same Page* multiple times. This does not sound useful for leaf nodes. But this is useful for internal nodes at least for first 2 levels, which are always getting accessed and are hot. So, pin them as soon as db start getting request. Now, you never need to go to the buffer pool manager for Page* as they are not going to get evicted.
why people say he has hygiene issue??? back to the first video when he is in his tub... he looks so clean and has zero hygiene problem... why some people are twisted like this...
it's probably a joke