Building a robust data pipeline with dbt, Airflow, and Great Expectations

Поделиться
HTML-код
  • Опубликовано: 12 сен 2024

Комментарии • 14

  • @mahmoudnabil3667
    @mahmoudnabil3667 2 года назад +1

    appreciate your simple explanations , thanks!

  • @kshitijpathak8646
    @kshitijpathak8646 2 года назад +6

    Great explanations! One quick question, in dbt we can test/profile our data during 'T' (Transformation) and also run test for source data. How is 'Great Expectations' different than the features already built in dbt?

    • @personalbranddata
      @personalbranddata Год назад +6

      Frankly it's unnecessary to incorporate Great Expectations into the stack. You could indeed just replace it with dbt tests and be done.
      There are just a few things to consider when choosing between these two:
      - Great Expectation tests are written in Python while dbt tests are written (mostly) in SQL syntax. You might find a particular test to be easier to implement one way or another. Personally I think in 99% of cases dbt test cover everything you need.
      - Great Expectations provides more kinds of tests "out of the box". But to counter this point, there is the dbt-expectations package that provides out of the box tests as well.
      - [This one is huge!] Great Expectations requires you to load data from the database into your Python processing environment while dbt tests work within the database. So Great Expectations tests tend to be more expensive and also take longer to complete than equivalent dbt tests.
      - Great Expectations produces these data quality reports. Perhaps they are important to your organisation.
      I'm sure there are arguments for using Great Expectations as well. You certainly could use both as the talk suggests. I just couldn't think of a good use case. The talk didn't convince me but in its defense it's over been over 2 years. Perhaps dbt tests and the dbt-expectations package had less features at that time. Or perhaps they had a legacy codebase with Great Expectations test before they adopted dbt and didn't want to re-implement these tests.
      In any case my recommendation would be to ignore this advice and just go with dbt tests in most cases.

    • @kshitijpathak8646
      @kshitijpathak8646 Год назад

      @@personalbranddata Thank you!

  • @FirstNameLastName-fv4eu
    @FirstNameLastName-fv4eu Месяц назад

    This is like making a small problem Very Big then creating a Giant Open Source to solve it!

  • @brads2041
    @brads2041 Год назад +1

    Interesting challenge, at times, to test raw source data because you may have to apply some form of transform to get it into a testable state.

  • @johnfordice5763
    @johnfordice5763 3 года назад +3

    Would love to see a repo for this!

    • @dbt-labs
      @dbt-labs  3 года назад +13

      A bit late, but we got you: github.com/spbail/dag-stack

  • @JanekBogucki
    @JanekBogucki 3 года назад

    Nice intro.

  • @willianrocha8615
    @willianrocha8615 3 года назад +1

    Code? Repo ?

  • @naveenn3143
    @naveenn3143 3 года назад

    Is there a repo to refer ?

  • @smileysuvarna5889
    @smileysuvarna5889 Год назад

    How to create dbt pipeline?

  • @echo2net
    @echo2net 3 года назад

    no repo?