Slowly Changing Dimensions - The Ultimate Guide

Поделиться
HTML-код
  • Опубликовано: 28 июл 2024
  • Free SQL Pattern Training: etlsql.kartra.com/page/sps-fr...
    Do you understand SQL but still cannot write SQL queries correctly ?
    Are you afraid to switch jobs because of the SQL interview rounds or have you failed SQL Interview earlier ?
    If you are someone who understands SQL and need more confidence to solve SQL queries correctly, I would like to share my recent online course especially created for you.
    Check for more details: etlsql.kartra.com/page/sql-pa...
    In this video I have explained what is Slowly Changing Dimension tables. Why do we need it and how do we create it ?
    We will also talk about Table design consideration for SCD tables and how do we implement SCD-1, SCD-2 & SCD-3 tables using SQL.
    A Dimension table stores attributes and descriptive details about the measures calculated in the Fact tables. With time it is expected that some changes will happen to these attributes stored in the Dimension tables. Hence the word "Changing".
    However those changes may not be very frequent. Thereby adding the word "Slowly".
    The Dimension table which is expecting some changes over time is called Slowly Changing Dimension.
    Link to the blog referred in the video:
    etl-sql.com/slowly-changing-d...
    Video Timeline:
    00:00 Introduction
    00:50 SCD Definition
    01:40 Different Types of SCD - 1,2,3
    05:00 SCD Implementation - Table design consideration
    10:40 SCD Implementation - SQL Update & Insert
    15:15 Day 1 - Staging
    15:57 Day 1 - SCD-1
    16:39 Day 1 - SCD-2
    17:00 Day 1 - SCD-3
    18:00 Day 2 - Staging
    18:38 Day 2 - SCD-1
    19:43 Day 2 - SCD-2
    21:55 Day 2 - SCD-3
    23:06 Day 3 - Staging
    23:24 Day 3 - SCD-1
    24:08 Day 3 - SCD-2
    25:08 Day 3 - SCD-3
    Leave your comments below.
    Practice SQL questions on Data Lemur platform.
    I will highly recommend to sign up for this platform.
    I am sharing my referral link below for easy reference.
    bit.ly/3SuF3wf

Комментарии • 7

  • @chinmaypawar4801
    @chinmaypawar4801 Год назад

    Amazing explanation! Looking forward for more such videos.

    • @ETLSQL
      @ETLSQL  Год назад

      Glad you liked it.

  • @sassydesi7913
    @sassydesi7913 2 месяца назад +1

    very nice. pls make for Different types of FACT tables too (Aggregated/Snapshot/Null etc.etc.)

    • @ETLSQL
      @ETLSQL  2 месяца назад

      glad you liked it. sure, added to the list.

  • @abdelrahmanmangoud1139
    @abdelrahmanmangoud1139 10 месяцев назад

    Thank you for the clear and decent explanation ,
    How could we handle the SCD 2 in case of we run day (2) multiple times on the same day ?
    we may need to add an additional condition on the update statement to compare the STG date and DWH start date ( STG Start Date != DWH Start Date)
    Also for the insertion do we need to get the delta to avoid duplication ?
    so the insertion condition should be
    INSERT INTO DWH VALUES()
    SELECT
    FROM Source
    WHERE NOT EXISTS (
    SELECT 1
    FROM DWH AS D
    WHERE DWH.KEY = Source.Key
    AND D.END_DATE = '2099'
    )
    Thanks again for the explanation , it helped a lot :)

    • @ETLSQL
      @ETLSQL  10 месяцев назад

      Happy to help.
      There is one assumption in the staging table that it has only delta records. This means that only new or updated records come in the staging table. So even if we run the pipeline multiple times within a day, we can safely run the same update+insert combination. Since we are using timestamp column, it will be able to track the changes with in the same day as well.
      If the staging has all the records for the day then in each run I will read the most recent entry for each key and will use that as the delta records.
      Like row_number() over (partition by key order by insert_ts desc) = 1
      This will give me most recent entry for each key from the staging table.
      Now I will use the output in my update and insert query as source data.
      Hope this helps.

    • @abdelrahmanmangoud1139
      @abdelrahmanmangoud1139 10 месяцев назад

      @@ETLSQL it does , thank you so much