Advancing Spark - Identity Columns in Delta

Advancing Spark - Delta Sharing

Monitoring Databricks with System Tables

Brits try the #1 Texas BBQ in the World!

Paris Olympics organisers apologise for opening ceremony's Last Supper parody

NEW Sea Tower - The MERMONKEY Has Arrived! (Bloons TD 6)

Advancing Spark - Exploring DLT Event Metrics

Advancing Analytics

Просмотров 4,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 дек 2021
One of the huge benefits of Delta Live Tables is the metrics that are generated automatically - logging rows processed, durations, audit information and, crucially, the results of the expectations placed on each table. However, it can be pretty tricky to pull out the metrics from the underlying tables.
In this video, Simon walks through the improvements to the DQ monitoring within DLT pipelines themselves, and then runs through the process of manually querying the event log and augmenting his DLT pipeline with some prepared data quality metrics
For more info on delta live tables, check out the docs here: docs.microsoft.com/en-us/azur...
As always, feel free get in touch with Advancing Analytics if we can help you get to where you need to be on your Lakehouse journey

Комментарии • 9

@chrisstephenson9890 2 года назад
Very informative, thank you. I very much like the additional features DLT and autoloader are adding to provide for standardised ELT frameworks.
@kurtmaile9977 2 года назад
Great video as always thanks, some of my favourite listening, particular around DLT which we area actively looking at ourselves! And great topic on DQ and handling with DLT.
Similar to Darryll's question, would love to see you explore the process of handling failed expectations as the next natural step to trapping and getting metrics. We next need to handle those records in the DLT pipeline that failed. For me there is 2 key scenarios:
1) Failure requiring manual intervention of a single record within a batch if it failed and can be patched within databricks itself. e.g identify suspect row(s), write SQL to update the row in the source, and let the CDC change stream into the processing again and pass through. Can see then some level of incident 'run(note)book automation' for example to fix this if predictable enough and the issue at source cant be fixed.
2) Transient failures - this for me is the actual biggest / most common one, and can often be observed with late arriving data at the point of a stream join where that enrichment is needed (and thus expressed as expectation on the target table) . e.g a SalesOrderDetail row, being enriched with Product/Sku info (i.e another DLT table), but for whatever reason (e.g product master feed is down for a period of time), the product row is not present at the time of the SalesOrderDetail join. BUT we expect it to eventually arrive in some acceptable bound of time.
This can and should be expected in a distributed system anyway - ideally 'what' you would want to occur is some automated retry of the failed rows again (by some condition, e.g up to 24 hours, or x retires, retrying every x period of time), where we expect it to eventually join correctly, and then pass the expectation on the target table, without any intervention needed. In essence you suspect it will self heal over time when the dependant data arrives, its just a matter of not 'hard failing' and then needing manual intervention.
Would love to hear your thoughts on this (scenario 2 in particular) and even some level of demo! :) My understanding is you'd need to hand roll something now for both use cases, point 1) is understandable, for 2), would love to see something natively built into DLT for this (perhaps there is and Im not aware), but in the absence of, what your thoughts are on handling such a scenario?
Thanks heaps, keep up the good work
@drummerboi4eva Год назад
thanks a lot Simon ,clear ideas , architecture and execution !! :)
@TechMomentAI 2 года назад ⁺³
Great video thanks. Can you share the notebooks you used?
@RajanieshKaushikk 2 года назад
Simply Awesome!!
@darryll127 2 года назад ⁺¹
How can we specifically identify which rows and which expectations on individual rows failed?
@ravirajuvysyaraju123 2 года назад
Thanks
@kaurivneet1 2 года назад
As always brilliant content. I like the way you structure your approach. One question, can we create column like (dqcheckfailed Y/N) rows in the table which doesn't meet expectations? That way if the data is corrected back in the source and reprocessed in lake, the column value gets updated. This will allow to give an accurate count of problematic rows in the data.
The above can be achieved by building custom data quality framework but was wondering if it can be baked in dlt.
@NeumsFor9 Год назад
Vendor has a lot of nerve charging premium prices and then leaving the engineer to "keep on digging". They've just got to do better. If I am buying the Expensive Car, I should not need to jury rig and jiggle the steering wheel to get my car to start.
Simon: Many thanks.
Databricks: Sew it together more professionally.

Следующие

Автовоспроизведение

Advancing Spark - Identity Columns in Delta

Advancing Spark - Identity Columns in Delta

Advancing Spark - Delta Sharing

Advancing Spark - Delta Sharing

Monitoring Databricks with System Tables

Monitoring Databricks with System Tables

Brits try the #1 Texas BBQ in the World!

Brits try the #1 Texas BBQ in the World!

Paris Olympics organisers apologise for opening ceremony's Last Supper parody

Paris Olympics organisers apologise for opening ceremony's Last Supper parody

NEW Sea Tower - The MERMONKEY Has Arrived! (Bloons TD 6)

NEW Sea Tower - The MERMONKEY Has Arrived! (Bloons TD 6)

EXTREME KITCHEN RENOVATION EP 9 | Appliances, Faucets & Vintage Lighting

EXTREME KITCHEN RENOVATION EP 9 | Appliances, Faucets & Vintage Lighting

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

Stream data from Event hub to Databricks

Stream data from Event hub to Databricks

Advancing Spark - Delta Live Tables Generally Available!

Advancing Spark - Delta Live Tables Generally Available!

Advancing Spark - Setting up Databricks Unity Catalog Environments

Advancing Spark - Setting up Databricks Unity Catalog Environments

Advancing Spark - Databricks Delta Change Feed

Advancing Spark - Databricks Delta Change Feed

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Advancing Spark - Developing Python Libraries with Databricks Repos

Advancing Spark - Developing Python Libraries with Databricks Repos

Advancing Spark - Delta Live Tables Merge!

Advancing Spark - Delta Live Tables Merge!

Declarative ETL pipelines with Delta Live Tables

Declarative ETL pipelines with Delta Live Tables

АКУЛ VS КЕФИР | КУБОК ФИФЕРОВ 2024 | 2 ТУР

АКУЛ VS КЕФИР | КУБОК ФИФЕРОВ 2024 | 2 ТУР

Вот как продлить жизнь Крузаку 200му🔥 @Garri_Vanui

Вот как продлить жизнь Крузаку 200му🔥 @Garri_Vanui

Каха заблудился в горах

Каха заблудился в горах

Useful gadget for styling hair 🤩💖 #gadgets #hairstyle

Useful gadget for styling hair 🤩💖 #gadgets #hairstyle

ЭКСПЕДИЦИЯ! БУЛЬДОЗЕР В БОЛОТЕ. Т-130! ЗОНА УСЕЯНА МЕТАЛЛОМ. "СССР" ВСЁ БРОШЕНО! МАСТЕРСКИЕ ГУЛАГА.

ЭКСПЕДИЦИЯ! БУЛЬДОЗЕР В БОЛОТЕ. Т-130! ЗОНА УСЕЯНА МЕТАЛЛОМ. "СССР" ВСЁ БРОШЕНО! МАСТЕРСКИЕ ГУЛАГА.

Старый Дим Димыч вернулся😱

Старый Дим Димыч вернулся😱

Воскресный утренний стрим!

Воскресный утренний стрим!

Kettim gul opkegani😋

Kettim gul opkegani😋