How unsupervised machine learning can scale data quality monitoring in Databricks

Flexible Scheduling with Automation in Data Engineering (A Dagster Deep Dive)

Airflow Vs. Dagster: The Full Breakdown!

Shannon Sharpe reacts to Skip Bayless leaving Undisputed & Fox Sports | Nightcap

MLS All-Stars vs. LIGA MX All-Stars | 2024 MLS All-Star Game | Full Match Highlights | July 24, 2024

FIRST LOOK: 2025 Corvette ZR1 - 1064hp, Turbos & 215mph!

Rethinking Orchestration as Reconciliation: Software-Defined Assets in Dagster

Databricks

Просмотров 3,8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 июл 2022
This talk discusses “software-defined assets”, a declarative approach to orchestration and data management that makes it drastically easier to trust and evolve datasets and ML models. Dagster is an open source orchestrator built for maintaining software-defined assets.
In traditional data platforms, code and data are only loosely coupled. As a consequence, deploying changes to data feels dangerous, backfills are error-prone and irreversible, and it’s difficult to trust data, because you don’t know where it comes from or how it’s intended to be maintained. Each time you run a job that mutates a data asset, you add a new variable to account for when debugging problems.
Dagster proposes an alternative approach to data management that tightly couples data assets to code - each table or ML model corresponds to the function that’s responsible for generating it. This results in a “Data as Code” approach that mimics the “Infrastructure as Code” approach that’s central to modern DevOps. Your git repo becomes your source of truth on your data, so pushing data changes feels as safe as pushing code changes. Backfills become easy to reason about. You trust your data assets because you know how they’re computed and can reproduce them at any time. The role of the orchestrator is to ensure that physical assets in the data warehouse match the logical assets that are defined in code, so each job run is a step towards order.
Software-defined assets is a natural approach to orchestration for the modern data stack, in part because dbt models are a type of software-defined asset.
Attendees of this session will learn how to build and maintain lakehouses of software-defined assets with Dagster.
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / data. .
Instagram: / databricksinc
Наука

Комментарии •

Следующие

Автовоспроизведение

How unsupervised machine learning can scale data quality monitoring in Databricks

How unsupervised machine learning can scale data quality monitoring in Databricks

Flexible Scheduling with Automation in Data Engineering (A Dagster Deep Dive)

Flexible Scheduling with Automation in Data Engineering (A Dagster Deep Dive)

Airflow Vs. Dagster: The Full Breakdown!

Airflow Vs. Dagster: The Full Breakdown!

Shannon Sharpe reacts to Skip Bayless leaving Undisputed & Fox Sports | Nightcap

Shannon Sharpe reacts to Skip Bayless leaving Undisputed & Fox Sports | Nightcap

MLS All-Stars vs. LIGA MX All-Stars | 2024 MLS All-Star Game | Full Match Highlights | July 24, 2024

MLS All-Stars vs. LIGA MX All-Stars | 2024 MLS All-Star Game | Full Match Highlights | July 24, 2024

FIRST LOOK: 2025 Corvette ZR1 - 1064hp, Turbos & 215mph!

FIRST LOOK: 2025 Corvette ZR1 – 1064hp, Turbos & 215mph!

Can I Break 50 With President Donald Trump?

Can I Break 50 With President Donald Trump?

Five Things You Didn't Know You Could Do with Databricks Workflows

Five Things You Didn't Know You Could Do with Databricks Workflows

Presto Tech Talk: Intro to Presto and Superset

Presto Tech Talk: Intro to Presto and Superset

Manage your data pipelines with Dagster | Software defined assets | IO Managers | Updated project

Manage your data pipelines with Dagster | Software defined assets | IO Managers | Updated project

Why use DuckDB in your data pipelines ft. Niels Claeys

Why use DuckDB in your data pipelines ft. Niels Claeys

Configuration & Resources (A Dagster Deep Dive)

Configuration & Resources (A Dagster Deep Dive)

Evolution of Data Architectures and How to Build a Lakehouse

Evolution of Data Architectures and How to Build a Lakehouse

Dagster Crash Course: develop data assets in under ten minutes

Dagster Crash Course: develop data assets in under ten minutes

Asset-Based Data Orchestration (from DATA + AI Summit 2023)

Asset-Based Data Orchestration (from DATA + AI Summit 2023)

Mastering Chaos - A Netflix Guide to Microservices

Mastering Chaos - A Netflix Guide to Microservices

AMD RX 7600 тест в играх и сравнение pci express 4.0 vs 3.0

AMD RX 7600 тест в играх и сравнение pci express 4.0 vs 3.0

ЗАБЫТЫЙ IPHONE 😳

ЗАБЫТЫЙ IPHONE 😳

Невероятная находка!😱 ( @vitaskhr Подписывайтесь на него )

Невероятная находка!😱 ( @vitaskhr Подписывайтесь на него )

How to Soldering wire in Factory ?

How to Soldering wire in Factory ?

КОМП В МЕШКЕ / КУПИЛ В ДНС ПК ЗА 50К ОТ MSI. ВСТРОЙКА ФОРЕВЕР?

КОМП В МЕШКЕ / КУПИЛ В ДНС ПК ЗА 50К ОТ MSI. ВСТРОЙКА ФОРЕВЕР?

Хакер взломал компьютер с USB кабеля. Кевин Митник.

Хакер взломал компьютер с USB кабеля. Кевин Митник.

Создание ЭКСКЛЮЗИВНЫХ колонок с использованием современных технологий

Создание ЭКСКЛЮЗИВНЫХ колонок с использованием современных технологий