Fact Table Loading with ADF Mapping Data Flows

Поделиться
HTML-код
  • Опубликовано: 15 янв 2025

Комментарии • 14

  • @NeumsFor9
    @NeumsFor9 3 года назад

    May have mentioned this in another video, but in order to implement custom early arriving fact pattern from Project REAL (fully cached lookup and, when row is not matched, a custom process, on the error path calls a sproc which inserts into dimension, marks "Y" to InferredMembet, and "scope identiies" the matching SK back into the cache so that if the same BK shows up in a subsequent row, the SK is gotten is matched.
    Boy would I love if we could add that functionality to the lookup without having to do a custom activity.
    Currently, that's the only way to do that in a mapping data flow, correct?.
    I guess the challenge would be to broadcast that SK back into spark.....
    Any idea what the level of effort it would be for the team to do the equivalent of that project real custom component? I used it regularly in almost all of my SSIS ETL projects .....
    E-A-G-L-E-S.....EAGLES!!!!!!

  • @abdelkadersardi7242
    @abdelkadersardi7242 3 года назад

    great video! can you show us the process of loading the late arriving dimension. Thank you

  • @chandra939
    @chandra939 4 года назад +1

    Is there a way i can use the same lookup table to get surrogates for 5 different fields from the source, or i will have to create 5 different lookups for each field? Also i see there is a limitations on number of rows a lookup can handle in that case can i use joiner in above case where i have to get surrogate for 5 different fields from the same table?

    • @MSDataFactory
      @MSDataFactory  4 года назад +1

      You use the same source multiple times in multiple Lookup transforms. However, if you are using different conditions in your matching criteria, then you will need a new Lookup transform for each different condition.

    • @MSDataFactory
      @MSDataFactory  4 года назад +1

      There are no row limits in Lookup or Join

    • @chandra939
      @chandra939 4 года назад

      @@MSDataFactory Thank you for quick reply.

  • @montoyescoful
    @montoyescoful 4 года назад

    So.... if you execute this dataflow twice, then the final table will be result with duplicates values, does'n it?

    • @MSDataFactory
      @MSDataFactory  4 года назад

      When you operationlize this process, you'll implement a pattern where you either check "Move" or "Delete" files from the source after processing, or just query the rows from the last execution so that you don't record duplicate facts.

    • @WickedOne420100
      @WickedOne420100 4 года назад

      @@MSDataFactory Is there a pattern that shows this "operationalized" with different options for SQL or ADLS?

    • @MSDataFactory
      @MSDataFactory  4 года назад

      @@WickedOne420100 Part 3 in this series talks about operationalizing the ETL pipelines: ruclips.net/video/FFCbU4ujCiY/видео.html

  • @CHANLEY545454
    @CHANLEY545454 4 года назад

    I understand that this was for a demo purpose but would this work in real life with a different source table where you won't know what the new column additions/deletes would be from the source?

    • @MSDataFactory
      @MSDataFactory  4 года назад

      Yes. You can either tell the data flow what the new column is via parameters, or you can allow data flow to just pass it through, regardless of data type, name, or ordinal position. If you need to transform that column, use a column pattern. This is known as "schema drift" with "late bindings".

    • @CHANLEY545454
      @CHANLEY545454 4 года назад

      My apologizes for posting my Schema Drift question to the wrong video but yes it's about Schema Drift in general.
      Scenario: I have TableA and TableB. Both Source and Sink have Schema Drift with "infer drifted column types" checked on source.
      I did an initial load from A to B.
      I then added a column in A without touching B.
      I got this to work by checking "recreate table" in sink
      This actually makes sense but my long term goal is to build dimension and fact tables while doing incremental loads instead of full loads.
      We want to hand this solution to an Analyst with some SQL knowledge but don't want them to have to go into the DB and make DDL changes.
      Any suggestions?
      "new column is via parameters".. Hmm that sounds promising since I can inspect the source table(s) before the load. Would be awesome if you could post a video that covers the basics of not knowing at all how the source (TableA) has changed. I guess in my scenario you would have to know what changed on the source before hand though.

    • @MSDataFactory
      @MSDataFactory  4 года назад

      @@CHANLEY545454 Did you try these 2 videos? ruclips.net/video/urzLAb83IjU/видео.html ruclips.net/video/PIGw-Z-0upw/видео.html