some of my key takeaways : 1. 7:38 - “we want to eventually make it possible to run DLT locally outside of Databricks” - Very important good to know ! 2. 15:50 - “we are not the only project kind of in this space ... things like DBT” - there is an ongoing question regarding DLT and DBT are they the same and what are the differences - see answer at 40:29 3. 17:12 - DLT has better auto scaling then DBX workflows - to bad that they don’t take those capabilities into workflows and allow us to enjoy them 4. 20:07 - “now we can use Unity catalog with DLT”- well ..that isn’t that simple - the limitation list is very long and Managed locations/tables aren’t supported yet 5. 22:20 - medallion model is a cool idea but totally made up ... it should be used as a very useful vocabulary for data quality - so don’t obsess about it :slightly_smiling_face: 6. 26:09 - Enzyme engine is used for materialized view incremental work - doesn’t handle all types of quires - take that into account it will be supported in the long term but no due dates.. 7. 34:12 - always prefer using expressions vs UDF’s 8. DLT Serverless will have a lot of capabilities - Note - serverless isn’t supported on all cloud vendors yet 9. to this day there isn’t an option to Debug DLT (as within a workflow notebook) without needing to run the pipeline - per the answers in the Chat this should come in the future
Really great initiative, giving insight into both the product and vision of the developers. Thank you so much for sharing and putting time and effort into publishing all your great videos 👌🙏
I want to see how my architecture diagram will look like when I put Streaming, Materialized Views, Serverless, DLT etc.. out together for ingesting data like structured and semi-structured.
It was mentioned at 14:10 that SCD2 sounds conceptually simple to start with but can get so complicated that all examples around streaming SCD2 needed to be corrected in the docs. I wonder if there is a good conceptual guide about all the possible SCD2 scenarios (with sample source and expected outcome records). Besides the usual update / insert / delete there is "deleted row reinstated" and other interesting ones.
I'm using SCD2 apply changes. I would like the end date to be updated if there is a hard delete in the raw data. I could use WHEN NOT MATCHED BY SOURCE outside of DLT. But within DLT, it doesn't seem possible. Is it?.
some of my key takeaways :
1. 7:38 - “we want to eventually make it possible to run DLT locally outside of Databricks” - Very important good to know !
2. 15:50 - “we are not the only project kind of in this space ... things like DBT” - there is an ongoing question regarding DLT and DBT are they the same and what are the differences - see answer at 40:29
3. 17:12 - DLT has better auto scaling then DBX workflows - to bad that they don’t take those capabilities into workflows and allow us to enjoy them
4. 20:07 - “now we can use Unity catalog with DLT”- well ..that isn’t that simple - the limitation list is very long and Managed locations/tables aren’t supported yet
5. 22:20 - medallion model is a cool idea but totally made up ... it should be used as a very useful vocabulary for data quality - so don’t obsess about it :slightly_smiling_face:
6. 26:09 - Enzyme engine is used for materialized view incremental work - doesn’t handle all types of quires - take that into account it will be supported in the long term but no due dates..
7. 34:12 - always prefer using expressions vs UDF’s
8. DLT Serverless will have a lot of capabilities - Note - serverless isn’t supported on all cloud vendors yet
9. to this day there isn’t an option to Debug DLT (as within a workflow notebook) without needing to run the pipeline - per the answers in the Chat this should come in the future
Really great initiative, giving insight into both the product and vision of the developers. Thank you so much for sharing and putting time and effort into publishing all your great videos 👌🙏
Great format. Lots of new and interesting things to think about. Looking forward to the next one.
Really excited about apply changes from snapshot! Is there a timeline for this feature? I know, I know, but I had to ask!
I can support this, it would be so great to have it available soon at least in some kind of preview!
I want to see how my architecture diagram will look like when I put Streaming, Materialized Views, Serverless, DLT etc.. out together for ingesting data like structured and semi-structured.
It was mentioned at 14:10 that SCD2 sounds conceptually simple to start with but can get so complicated that all examples around streaming SCD2 needed to be corrected in the docs. I wonder if there is a good conceptual guide about all the possible SCD2 scenarios (with sample source and expected outcome records). Besides the usual update / insert / delete there is "deleted row reinstated" and other interesting ones.
Great video, great series. Please mute next time when ur guest is speaking, cus the echo is super annoying - no problem ;)
I'm using SCD2 apply changes. I would like the end date to be updated if there is a hard delete in the raw data. I could use WHEN NOT MATCHED BY SOURCE outside of DLT. But within DLT, it doesn't seem possible. Is it?.