05 - Buffer Pools + Memory Management (CMU Databases Systems / Fall 2019)

Поделиться
HTML-код
  • Опубликовано: 26 янв 2025

Комментарии • 44

  • @kuntaliem
    @kuntaliem 5 лет назад +11

    Thank you for the awesome video. I guess PostgreSQL also supports some form of scan sharing. For each relation, it keeps track of the starting location (implemented in ss_get_location()) from where a scan should begin. It tries to synchronize multiple scans on the same relation.

  • @marcoq7160
    @marcoq7160 4 года назад +42

    0:15 DJ Drop Tables has some problems
    0:48 Administrivia
    1:18 Database workloads
    1:47 Bifurcated environment: OLTP data silos -> Extract-Transform-Load (ETL) -> OLAP data warehouse
    3:17 HTAP shifts some of the analytical queries to the OLTP side
    3:59 Database storage
    4:54 Spatial vs Temporal control
    5:48 Disk-oriented DMBS
    6:49 Today's agenda: buffer pool manager, replacement policies, other memory pools
    7:22 Buffer pool organization: array of fixed-size frames
    9:11 Buffer pool meta-data: page table, dirty flag, pin counter
    12:30 Locks (for DB's logical contents) vs latches (for internal structures, mutex)
    15:19 Page table (in-memory) vs Page directory (on disk)
    16:30 Allocation policies: global vs local policies
    17:53 Buffer pool optimizations: multiple buffer pools, pre-fetching, scan sharing, buffer pool bypass
    18:42 Multiple buffer pools (reduce latch contention and improve locality)
    21:26 Multiple buffer pools: 1) object ID, 2) hashing
    22:58 Pre-fetching (based on a query plan)
    28:24 Scan sharing
    32:55 Where do intermediate query results go?
    35:07 Reminder: relational model is unordered
    36:05 A question from student
    36:44 Buffer pool bypass
    38:21 OS page cache (use direct I/O, O_DIRECT to bypass it)
    40:13 Postgres demo
    55:06 Why don't other systems use OS page cache as Postgres does? (hard to guarantee consistent behavior across different OSs)
    58:19 Buffer replacement policies
    1:00:15 Least-recently used (LRU)
    1:01:00 Clock (approximation of LRU)
    1:03:36 Sequential flooding
    1:05:23 Better policies: LRU-K (estimates the interarrival times of references on a page by page basis)
    1:06:21 Better policies: localization (evict on a per txn/query basis)
    1:07:10 Better policies: priority hints
    1:08:39 Dirty pages (trade-off)
    1:10:10 Background writing
    1:11:06 Other memory pools
    1:11:20 Conclusion
    1:11:47 Project #1
    1:12:45 Task #1 - clock replacement policy
    1:13:37 Task #2 - buffer pool manager
    1:14:23 Getting started

  • @linziye7714
    @linziye7714 5 лет назад +17

    precious resource for database! Thank you CMU!

  • @anuragshah3433
    @anuragshah3433 Год назад

    Why did query performance not improve post moving every tuple to buffer pool at 49:13? Time stamp of query run with tuples being fetched from disk - 43:37.

  • @AndersonSilva-dg4mg
    @AndersonSilva-dg4mg 5 лет назад +8

    Hooray! new video. Thankyou very much!

  • @roly4301
    @roly4301 5 лет назад +5

    Thank you making available these high-quality lectures and programming assignments. Was wondering if there's any chance you could release the grading scripts for previous offerings of the course?

    • @andypavlo
      @andypavlo 5 лет назад +12

      We're working on it.

    •  4 года назад

      Yes. I love to. This is an another great resource for this amazing course.

    • @AshishNegi1618
      @AshishNegi1618 4 года назад

      @Vipul Bhardwaj i am also interested. But i think concern for Andy is that online discussion forum will cause solutions to leak.
      we can privately discuss in our emails.

  • @anikethdeshpande8336
    @anikethdeshpande8336 2 года назад

    @4:33 ---> which is the new hardware where we can push the execution logic to disk ?

    • @andypavlo
      @andypavlo 2 года назад

      pages.cs.wisc.edu/~jignesh/publ/SmartSSD.pdf

    • @rodypar317
      @rodypar317 2 года назад

      ruclips.net/p/PL5Q2soXY2Zi-Mnk1PxjEIG32HAGILkTOF

  • @ljratner
    @ljratner 5 лет назад +3

    Slide 16 is missing from the PDF.

  • @yashwinenamadi1673
    @yashwinenamadi1673 4 года назад +1

    Why does any os allow DBMS to maintain its own memory i.e., the buffer pool? DBMS is a mere process running on the os. Can't the buffer pool be swapped out of memory by the os to accommodate other processes' memory? Or is there a hardware support for this buffer pool dedicated to be managed by a DBMS software?

    • @meamzcs
      @meamzcs 4 года назад

      Sure the OS can but that's not that DBs job to worry about. The OS does its own bookkeeping to determine what memory pages it swaps and if it's the buffer pool that's swapped out that means most likely every other page in memory is better to have in memory than the buffer pool... And once the DBMS accesses the buffer pool again you will get a page fault anyway which will lead to the OS pulling it back into memory...

    • @sebastiannyberg3008
      @sebastiannyberg3008 2 года назад

      The memory used by the buffer pool is no different from any other virtual memory given by sbrk. The buffer pool memory can be "swapped out" to a swap partition to avoid OOMkill, but not written out by the OS like pages from a file opened without O_DIRECT.
      Note that with a DBMS you usually don't want swap partitions due to the hard-to-debug performance degradation.

    • @sebastiannyberg3008
      @sebastiannyberg3008 2 года назад

      Note that the Linux kernel used to separate the buffer cache from the page cache, but the two were merged some ways back. The OS obviously needs to bring in the blocks from disk, and does some (minimal) caching even when using O_DIRECT.
      You can read more about the O_DIRECT flag in the open(2) syscall manpage

  • @bisakhmondal8371
    @bisakhmondal8371 4 года назад +5

    Coolest Prof. ever. Thanks for the awesome contents.
    Btw who is "tk"(he/she is in every video)!! XD

  • @drevil7vs13
    @drevil7vs13 4 года назад

    30:58 I wonder if Oracle supports cursor sharing for 2 queries with the same sql_id but with different bind variable values

    • @moazeldefrawy4379
      @moazeldefrawy4379 4 года назад

      It probably does? since the values for each row are stored together (aka .. it's a row database)

  • @prathibhapb
    @prathibhapb 3 года назад

    22:50 - How does Approach #2 (Hashing) help if each database has its own buffer pool?

    • @zhenghanghu2430
      @zhenghanghu2430 2 года назад +1

      I think the point is if you don't have the one-buffer-pool-per-table set up, then you can use hashing to find out the buffer pool each page belongs to. If the one-buffer-pool-per-table set up is applied, then you don't need hashing. (Or, you can have multiple buffer pools for each table, that way hashing can be useful.) We also see in the lecture we can have a buffer pool for each query, that case we don't need hashing neither.

  • @williamweiler449
    @williamweiler449 3 года назад +1

    How do multiple buffer pools reduce latch contention if threads still must contend for latches on the same page?

    • @rofamohamed2398
      @rofamohamed2398 3 года назад +1

      Each buffer pool has a page table related to it, so you have multiple buffer pools and also multiple page tables, which will reduce the latch contention due to we increased the number of page tables.
      Hope i cleared it enough.

    • @williamweiler449
      @williamweiler449 3 года назад

      @@rofamohamed2398 Ah, so latches are placed on the page table, not the buffer page itself?
      thanks for the reply

    • @rofamohamed2398
      @rofamohamed2398 3 года назад +1

      @@williamweiler449 Yeah, latches are placed on the page table, to prevent concurrent access to the position that will hold the pointer to the buffer pool frame, which will be helpful when we need to get a specific page to know which frame the page resides in and that's the job of page table (to know the positions of the pages inside the buffer pool).

  • @himanshuhbk953
    @himanshuhbk953 5 лет назад +5

    Can non-CMU students submit these assignments ?

    • @jcdyer3
      @jcdyer3 4 года назад +6

      Probably not, that's asking for a big donation of unpaid work from the course staff, but you can do them yourself, and then research the answers to check if you were right.

  • @drakenguyen2390
    @drakenguyen2390 4 года назад +1

    thank you for this high quality content. This is a long shot but is there anyway we can have a solution for the project? Like sample code for for eg.

  • @alielzahaby3315
    @alielzahaby3315 2 года назад

    Should I implement the assignments when im listening online or is it additional thing?

    • @alielzahaby3315
      @alielzahaby3315 2 года назад

      Like would it matter that much with understanding the course?

  • @sheikhmuhammedshoaib1124
    @sheikhmuhammedshoaib1124 4 года назад

    Can we get project source code for reference?

  • @willw2596
    @willw2596 3 года назад

    I'm not buying the benefit of pre-fetching, that assumes the pages on the disk are prearranged to be adjacent for the particular query type. In real life, pages are allocated in random in regarding to the ordering of the data. The scanning of the index pages looks dubious. B+Tree pages are only allocated when split, which happens randomly based on the data inserted. I mean you can say this big chunk of pages have all the index pages and load all of them at one shot, but so can mmap to load the big chunk of pages at one shot.

  • @peijunwu7354
    @peijunwu7354 2 года назад

    23:00

  • @arhyth
    @arhyth 5 лет назад +1

    do distributed databases also have buffer pools? i would assume they don't. as the communication cost of keeping members in sync with each other defeats the use of buffer pools.

    • @adamjaffe1712
      @adamjaffe1712 5 лет назад +6

      I think each node in the distributed database will manage its own buffer pool. There is no need to synchronize it since the buffer pool is an optimization to avoid disk access - it doesn't affect the correctness of the underlying data.

    • @AshishNegi1618
      @AshishNegi1618 4 года назад +2

      currently in this lecture we are studying single machine database internal details. Different machines don't share memory and hence any of buffer pools etc data structures. Buffer pools is way to access 100 GB database file with 1 GB ram - like virtual memory - without higher layers knowing about it.

    • @prathibhapb
      @prathibhapb 3 года назад

      Distributed databases can still have node local buffer pools as they have some data storage in each node.

  • @tianhaowang4045
    @tianhaowang4045 3 года назад

    rarely a video with 0 thumb-down

  • @rosswang688
    @rosswang688 3 года назад

    rarely a vid with 0 thumb down.