07 - Tree Indexes I (CMU Databases Systems / Fall 2019)

Поделиться
HTML-код
  • Опубликовано: 25 янв 2025

Комментарии • 38

  • @electron0zero
    @electron0zero 5 лет назад +61

    Prof. Andy Pavlo is the best prof. I never had, thanks for uploading these to YT

  • @divjyotsethi2695
    @divjyotsethi2695 4 года назад +34

    Rarely give feedback on videos -- but really really really well done and fun lectures -- thank you Andy!

  • @marcoq7160
    @marcoq7160 4 года назад +25

    0:27 RUclips feedback
    1:17 SHA-256 is not reversible
    2:04 Administrivia
    2:50 Data structures
    3:37 Table indexes
    7:18 Table indexes trade-offs
    8:24 Today's agenda: B+Tree, design decisions, optimizations
    8:50 B-Tree family
    11:27 B+Tree
    13:26 B+Tree properties
    15:00 B+Tree example
    16:59 B+Tree node is an array of key/value pairs
    17:43 B+Tree leaf nodes
    20:25 Leaf node values: record IDs, tuple data
    21:36 B-Tree vs B+Tree
    25:24 B+Tree insert
    27:10 B+Tree insert demo
    33:07 B+Tree delete
    35:22 B+Tree delete demo
    36:35 How do you merge leaf nodes?
    38:42 B+Trees in practice
    39:45 Clustered indexes
    41:57 Selection conditions
    46:35 B+Tree design choices
    47:25 Node size (the slower, the larger)
    49:15 Merge threshold
    50:45 Variable length keys
    55:00 Key map / indirection
    1:01:31 Non-unique indexes
    1:02:04 Non-unique: duplicate keys (more common)
    1:02:24 Non-unique: value lists
    1:04:45 Intra-node search (linear, binary, interpolation)
    1:07:06 Optimizations
    1:07:18 Prefix compression
    1:10:28 Suffix truncation
    1:11:46 Bulk insert: 1) Sort keys 2) Build index from the bottom up
    1:14:04 Pointer swizzling
    1:16:43 B+Tree is awesome

  • @johnzhong5147
    @johnzhong5147 4 года назад +8

    I graduated from CMU without taking this course, which I have been most regretful of. Thanks for posting this course Andy! It's the best-ever learning source for database! Please ignore those malicious comments!

  • @garfieldnate
    @garfieldnate 4 года назад +5

    Your handling of the RUclips comments at the beginning will make you legendary in my mind forever! That was just amazing.

  • @mudasir989
    @mudasir989 5 лет назад +8

    Thank you prof for making avaiable this video to poor countries who dont have access to this kind of schools

  • @hnasr
    @hnasr 4 года назад +17

    Awesome content!

    • @ifyugwumba8120
      @ifyugwumba8120 3 года назад

      My mentor #hussein 🙌🙌 please join him up #bestBackend

  • @carlochess
    @carlochess 5 лет назад +17

    Thanks for giving this lectures public available. I couldn't find any similar class in my country so what you are doing is really awesome :D

  • @boronglyu3126
    @boronglyu3126 4 года назад +8

    Best DB course I have ever seen! Learned a lot!

  • @Qladstone
    @Qladstone 3 года назад +7

    B-tree visualisation: www.cs.usfca.edu/~galles/visualization/BPlusTree.html

  • @milenkomarkovic
    @milenkomarkovic 4 года назад +7

    Fantastic lectures! Andy ,thank you for sharing your knowledge!

  • @kuntaliem
    @kuntaliem 5 лет назад +2

    Thank you for this awesome video. Some updates about recent Postgres changes - In PG, we actually append the heap TID with the key. This helps us to resolve the ordering problem of duplicates, also it maintains the same physical order as in the heap pages. We truncate away the suffix key attributes including the heap TID during a leaf page split.

  • @ananthasubramanian7355
    @ananthasubramanian7355 4 года назад +2

    Thank you very much for this Prof. Andy Pavlo. Great learning...!!! Like your lecture & attitude...!!!

  • @oisintheginger
    @oisintheginger 2 года назад

    Thank you for uploading the videos and the lecture notes. Really, really helpful and useful to a lot of people.

  • @Baraa8511
    @Baraa8511 4 года назад +1

    They are jealous from the successful people, that's the reason behind posted these comment
    Good job professor and keep going

  • @AIThoughts_MA
    @AIThoughts_MA 2 года назад

    Thank you so much prof. Andy, I really like your teaching style and I believe what you are doing is awesome. Thanks again :)

  • @j50313j50313
    @j50313j50313 3 года назад

    45:16 What if the rightmost leaf node contains (C, C), (C, D), (D, B)?
    In this case, (*, B) will still have to traverse the rightmost leaf node.

    • @j50313j50313
      @j50313j50313 3 года назад

      Oh, a student later asked this question.

  • @温东博
    @温东博 3 года назад

    Hey, I found a paper that might be relevant 1:13:40 but it talks aboud B-tree: Online B-Tree Merging. Is it what Andy refer to in the video?

  • @wjliu4328
    @wjliu4328 3 года назад +1

    Great video, I've learned a lot. Thanks for uploading these videos!

  • @AshishNegi1618
    @AshishNegi1618 4 года назад +1

    if b_tree page size is like 1 MB, greater than atomic disk write max page size 4kb , how is the consistency maintained during writes + crash ? probably might learn it in future class

  • @monfera
    @monfera 5 лет назад

    Neat thing is that interpolation is also helpful for bisection search, not just for establishing the starting point for linear search (depending on the dataset, a linear search after a distribution based interpolation can still be expensive, so one can do multiple interpolation hops on ever smaller intervals)

  • @antonlastauka4898
    @antonlastauka4898 4 года назад +1

    Hi. During your presentation of b+trees you struggled with size of the web page. In Firefox you can press (ctrl + '+') to zoom in page and (ctrl + '-') to zoom out; ctrl + 0 will make page 100% zoom. In url bar you can see current zoom level (200%)

  • @abdelrhmanahmed1378
    @abdelrhmanahmed1378 3 года назад

    in var length key ,how we gonna find the leaf node if we have a key ?! if we only store the adderss, i think i am missing something here

  • @pengdu7751
    @pengdu7751 2 года назад

    why does etcd uses b tree instead of b+ tree for its tree index?

  • @garfieldnate
    @garfieldnate 4 года назад

    Since Andy asked, I actually do have a question about project 1. In the lecture, it is stated that ClockReplacer should check if a page has been referenced lately, but none of the methods in Replacer update the ref flag on its own. Could we just implement the clock using a linked list, where a newly unpinned page is placed immediately *behind* the clock hand so that it is considered as late as possible for victimization? Or would this not work because the ref flag can be updated somehow? Is it perhaps possible to unpin a page multiple times? I do see this case in the unit tests, but I don't understand why the buffer pool manager would ever unpin a page twice without pinning it in between.

  • @GolerGkA
    @GolerGkA 4 года назад

    At slide 14, you're saying that Postgresql only uses Record IDs in indexes, but aren't indeces with INCLUDES clauses implemented by embedding additional data into the index itself?

  • @孟裕燊
    @孟裕燊 2 года назад

    Really learn so much~

  • @drevil7vs13
    @drevil7vs13 4 года назад

    11:34 Wikipedia mentions that B-tree authors never explained what the letter _B_ stands for :)
    en.wikipedia.org/wiki/B-tree#Origin

  • @avory6707
    @avory6707 4 года назад +1

    i love this course!!!

  • @sanjibsah455
    @sanjibsah455 5 лет назад

    Hey, I have a question. In slide 36, he concluded that it is better to store the page address rather than the page no. so that we don't have to go to the Buffer Pool Manager so that we can directly access the page directly into memory. So the question is-
    1. I think that the page allocation and de-allocation things are handled by the Buffer Pool manager but he is saying that we can directly use the pointer to the page instead of page numbers
    2. If we are going to get the page directly allocated without the use of Buffer pool manager then it may happen that we have copies of the same page in the memory and that may cause concurrency problem on that page. because that page may be updated differently by different operations of DBMS.
    If anyone has any answer to the questions please let me know or like this comment to get other people noticed this question and we can have our answers.

    • @zhiyizhang1616
      @zhiyizhang1616 4 года назад

      I think maybe the pages which store the keys have higher privilege to be in the buffer pool than any other kinds of pages. Because the DBMS uses index to query the data in most cases.

    • @AshishNegi1618
      @AshishNegi1618 4 года назад

      He talks about it at the end of lecture.
      You ask buffer pool manager for the first time for the page. Once you get the page, you pin it. Now, buffer pool manager can't deallocate it. You can use the same Page* multiple times. This does not sound useful for leaf nodes. But this is useful for internal nodes at least for first 2 levels, which are always getting accessed and are hot. So, pin them as soon as db start getting request. Now, you never need to go to the buffer pool manager for Page* as they are not going to get evicted.

  • @Max-my6rk
    @Max-my6rk 5 лет назад +3

    why people say he has hygiene issue??? back to the first video when he is in his tub... he looks so clean and has zero hygiene problem... why some people are twisted like this...