May have mentioned this in another video, but in order to implement custom early arriving fact pattern from Project REAL (fully cached lookup and, when row is not matched, a custom process, on the error path calls a sproc which inserts into dimension, marks "Y" to InferredMembet, and "scope identiies" the matching SK back into the cache so that if the same BK shows up in a subsequent row, the SK is gotten is matched. Boy would I love if we could add that functionality to the lookup without having to do a custom activity. Currently, that's the only way to do that in a mapping data flow, correct?. I guess the challenge would be to broadcast that SK back into spark..... Any idea what the level of effort it would be for the team to do the equivalent of that project real custom component? I used it regularly in almost all of my SSIS ETL projects ..... E-A-G-L-E-S.....EAGLES!!!!!!
Is there a way i can use the same lookup table to get surrogates for 5 different fields from the source, or i will have to create 5 different lookups for each field? Also i see there is a limitations on number of rows a lookup can handle in that case can i use joiner in above case where i have to get surrogate for 5 different fields from the same table?
You use the same source multiple times in multiple Lookup transforms. However, if you are using different conditions in your matching criteria, then you will need a new Lookup transform for each different condition.
When you operationlize this process, you'll implement a pattern where you either check "Move" or "Delete" files from the source after processing, or just query the rows from the last execution so that you don't record duplicate facts.
I understand that this was for a demo purpose but would this work in real life with a different source table where you won't know what the new column additions/deletes would be from the source?
Yes. You can either tell the data flow what the new column is via parameters, or you can allow data flow to just pass it through, regardless of data type, name, or ordinal position. If you need to transform that column, use a column pattern. This is known as "schema drift" with "late bindings".
My apologizes for posting my Schema Drift question to the wrong video but yes it's about Schema Drift in general. Scenario: I have TableA and TableB. Both Source and Sink have Schema Drift with "infer drifted column types" checked on source. I did an initial load from A to B. I then added a column in A without touching B. I got this to work by checking "recreate table" in sink This actually makes sense but my long term goal is to build dimension and fact tables while doing incremental loads instead of full loads. We want to hand this solution to an Analyst with some SQL knowledge but don't want them to have to go into the DB and make DDL changes. Any suggestions? "new column is via parameters".. Hmm that sounds promising since I can inspect the source table(s) before the load. Would be awesome if you could post a video that covers the basics of not knowing at all how the source (TableA) has changed. I guess in my scenario you would have to know what changed on the source before hand though.
May have mentioned this in another video, but in order to implement custom early arriving fact pattern from Project REAL (fully cached lookup and, when row is not matched, a custom process, on the error path calls a sproc which inserts into dimension, marks "Y" to InferredMembet, and "scope identiies" the matching SK back into the cache so that if the same BK shows up in a subsequent row, the SK is gotten is matched.
Boy would I love if we could add that functionality to the lookup without having to do a custom activity.
Currently, that's the only way to do that in a mapping data flow, correct?.
I guess the challenge would be to broadcast that SK back into spark.....
Any idea what the level of effort it would be for the team to do the equivalent of that project real custom component? I used it regularly in almost all of my SSIS ETL projects .....
E-A-G-L-E-S.....EAGLES!!!!!!
great video! can you show us the process of loading the late arriving dimension. Thank you
Is there a way i can use the same lookup table to get surrogates for 5 different fields from the source, or i will have to create 5 different lookups for each field? Also i see there is a limitations on number of rows a lookup can handle in that case can i use joiner in above case where i have to get surrogate for 5 different fields from the same table?
You use the same source multiple times in multiple Lookup transforms. However, if you are using different conditions in your matching criteria, then you will need a new Lookup transform for each different condition.
There are no row limits in Lookup or Join
@@MSDataFactory Thank you for quick reply.
So.... if you execute this dataflow twice, then the final table will be result with duplicates values, does'n it?
When you operationlize this process, you'll implement a pattern where you either check "Move" or "Delete" files from the source after processing, or just query the rows from the last execution so that you don't record duplicate facts.
@@MSDataFactory Is there a pattern that shows this "operationalized" with different options for SQL or ADLS?
@@WickedOne420100 Part 3 in this series talks about operationalizing the ETL pipelines: ruclips.net/video/FFCbU4ujCiY/видео.html
I understand that this was for a demo purpose but would this work in real life with a different source table where you won't know what the new column additions/deletes would be from the source?
Yes. You can either tell the data flow what the new column is via parameters, or you can allow data flow to just pass it through, regardless of data type, name, or ordinal position. If you need to transform that column, use a column pattern. This is known as "schema drift" with "late bindings".
My apologizes for posting my Schema Drift question to the wrong video but yes it's about Schema Drift in general.
Scenario: I have TableA and TableB. Both Source and Sink have Schema Drift with "infer drifted column types" checked on source.
I did an initial load from A to B.
I then added a column in A without touching B.
I got this to work by checking "recreate table" in sink
This actually makes sense but my long term goal is to build dimension and fact tables while doing incremental loads instead of full loads.
We want to hand this solution to an Analyst with some SQL knowledge but don't want them to have to go into the DB and make DDL changes.
Any suggestions?
"new column is via parameters".. Hmm that sounds promising since I can inspect the source table(s) before the load. Would be awesome if you could post a video that covers the basics of not knowing at all how the source (TableA) has changed. I guess in my scenario you would have to know what changed on the source before hand though.
@@CHANLEY545454 Did you try these 2 videos? ruclips.net/video/urzLAb83IjU/видео.html ruclips.net/video/PIGw-Z-0upw/видео.html