"Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb" - Danh Phan (PyCon AU 2023)

Поделиться
HTML-код
  • Опубликовано: 23 авг 2023
  • (Danh Phan) This talk will demonstrate an example of building trusted data pipelines with 3D packages: Dagster, Dbt, and Duckdb. First, it presents the importance of trusted data pipelines by testing data quality. It then discusses what we need to test data quality, from high levels (like tables, relations, …) to low levels (like rows and columns). After that, I will show how to implement these tests using different Dbt packages like dbt-utils and dbt-expectations.
    Finally, a demo with a complete ELT workflow will be presented. In this demo, Dagster is used as a data pipeline orchestrator, and Dbt is utilized for data transformation with its related testing packages. These transformed pipelines sit on top of Duckdb, which acts as a small data warehouse. This demo is published in a GitHub repository, allowing developers to clone and run the demo independently.
    The talk will help data and analytics engineers build more robust tests for their data pipelines. These trusted data pipelines could enhance the data quality and validation process, reducing the risk of other data issues like data drift for downstream channels.
    pretalx.com/py...
    python, pycon, australia, programming, conference, technical, pyconline, developers, panel, sessions, libraries, frameworks, community, sysadmins, students, education, data, science
    Videos licensed as CC-BY-NC-SA 4.0
    PyCon AU is the national conference for the Python programming community, bringing together professional, student and enthusiast developers, sysadmins and operations folk, students, educators, scientists, statisticians, and many others besides, all with a love for working with Python.
    Licensed as CC BY-NC-SA - creativecommons...
    Produced by Next Day Video Australia: nextdayvideo.c...
    Fri Aug 18 15:30:00 2023 at Hall C (Plenary)

Комментарии •