Data modeling interview filters so many data engineers! How to model slowly-changing dimensions

Поделиться
HTML-код
  • Опубликовано: 3 фев 2025

Комментарии • 27

  • @SnazzyKicks
    @SnazzyKicks 10 месяцев назад +9

    This is great and to the point. But pls add more to this topic .. some challenges/real time examples. Would be of great help to lot of people in the DE community @Zach

  • @ColmeiaX
    @ColmeiaX 22 дня назад +1

    Now I definitely understand SCD
    Thank you Zach !

  • @InsightsofAJ
    @InsightsofAJ Год назад +4

    Great one. I would say that leaning normalization is a good start.

    • @Levy957
      @Levy957 Год назад +2

      de-normalization too

  • @jonskaggs2891
    @jonskaggs2891 11 месяцев назад +11

    I’ve always wanted to model data, but I’ve never had the right dimensions for it. 🤓

  • @vamau40
    @vamau40 Год назад +2

    good one, keeps making plz

  • @denist80
    @denist80 Год назад +6

    Hi Zach, thanks for sharing your thoughts. I just wanted to see if there was a mistyped last name in your CC (0:16) -- "... go read the Kimball book, some people say, go read the Eman book ..." Should this be -- "... go read the Kimball book, some people say, go read the Inmon book ..." (Bill Inmon)

    • @EcZachly_
      @EcZachly_  Год назад +3

      You’re totally right! Nice catch!

    • @Milhouse77BS
      @Milhouse77BS 11 месяцев назад

      Inmon is unreadable. Makes Kimball look like Shakespeare.

  • @maleldil1
    @maleldil1 10 месяцев назад +3

    Why have the end date be in the future instead of just null?

    • @EcZachly_
      @EcZachly_  10 месяцев назад +8

      BETWEEN syntax doesn’t work if end date is NULL

  • @everythingalevels6645
    @everythingalevels6645 2 месяца назад +2

    please try Biryani. It will become your no. 1 and it will be a permanent dimension then

  • @4.0Solutions
    @4.0Solutions 4 месяца назад

    Why would you not just have the end-date null for the current dimension?

  • @wtfzalgo
    @wtfzalgo 9 месяцев назад +2

    I like the metaphorical explanation. Why don't you write your own platform agnostic data modeling book?

  • @patparillo
    @patparillo Год назад +2

    I know you mentioned learning by doing which is my preferred approach as well however do you have any resources on learning about scd type 1, type 2 etc?

    • @ZachRenwickData
      @ZachRenwickData Год назад

      kimballs the data warehouse toolkit has in depth explanations of slowly changing dimensions (all types)

  • @srinubathina4495
    @srinubathina4495 10 месяцев назад

    I want to learn data modeling from you do you offer any course to do that because I want to learn in depth
    knowledge on this concept

  • @workmode2073
    @workmode2073 Год назад +3

    What about modeling in MPPs like Redshift? Traditional dimensions/facts does not match the archi of MPPs

    • @EcZachly_
      @EcZachly_  Год назад +1

      Those are more denormalized, you’re right!

    • @workmode2073
      @workmode2073 Год назад

      @@EcZachly_ most companies are using MPPs these days just from the sheer speed/efficiency to cost ratio; then why are companies still testing facts/dimension/PK-FK based data modeling knowledge?

  • @Teluguhiker
    @Teluguhiker 2 месяца назад

    wouldn't storing age a bad idea? Just store the year

  • @satz611
    @satz611 4 месяца назад

    Dimension Snapshots is all that you want to know.. SCDs are outdated and not efficient, as storage and compute got cheaper..

    • @EcZachly_
      @EcZachly_  4 месяца назад

      SCDs are still very worth it once your dimensions hit a certain scale.
      Airbnb still considers them the gold standard. Maxime wrote that article and the other Airbnb data architects decided he was wrong.

    • @satz611
      @satz611 4 месяца назад +1

      @@EcZachly_ I see, I haven’t seen any SCDs at Meta.. maybe I didnt explore enough..not sure on Airbnb usecases. Its a bit challenging to read from SCDs from reporting tools(Unidash or Tableau). we access detail dimensions often for many metrics.. Besides, MapReduce architecture don’t directly support Updates, you’d have to compute the whole dataset again(by omitting older version and adding a newer version) for a single update. I felt Dimension Snapshot approach made a lot of sense for many practical usecases, even at the cost of storage and compute. Yes, there are drawbacks. Like, you can’t have unlimited history when capturing full snapshots.. that can be addressed by Hist, first/last events or datelist fields. I would want to know the usecase where SCDs are optimal.

  • @TheHermitProcess
    @TheHermitProcess 11 месяцев назад +2

    Std or SCD.😂😂😂😂😂 thanks!

    • @techgraph1233
      @techgraph1233 9 месяцев назад

      I was going to comment same 😝😂😂😂😂😂😂.
      He said that’s how you kind of get the STD. Lol

  • @filbertejess8711
    @filbertejess8711 Год назад +1

    *promosm*