Embedded ELT: Save your budget and simplify your data platform with Dagster Embedded ELT.

Поделиться
HTML-код
  • Опубликовано: 26 июл 2024
  • Most data pipelines require data movement steps. While elaborate 3rd party services like Fivetran, Airbyte, or Stitch have become popular plug-and-play solutions, their cost tends to balloon over time. Often the value of these services is limited, especially when dealing with simpler tasks such as database-to-database replication.
    Pedram Navid, Head of Data Engineering and DevRel at Dagster Labs, explains the value of the newly released Embedded ELT feature in Dagster, which provides support for lightweight data movement libraries.
    One organization saved $30,000 a year, moving their database-to-database sync off Fivetran and onto Embedded ELT.
    This session will benefit heads of data engineering or ML looking to improve the performance and reduce the cost of their data platform.
    More context on Dagster's Embedded ELT capability can be found in the Launch Week blogpost: dagster.io/blog/dagster-embed...
    The demo in this session will use the following code: github.com/dagster-io/mdsfest...
    00:00 Introduction and the challenges of ELT
    01:23 The advent of data ingestion SaaS solutions (Fivetran, Stitch, Airbyte, etc.)
    04:02 Why Embedded ELT - the value of doing data ingestion straight from the orchestrator
    06:45 What is Dagster Embedded ELT? Use the Sling and dlt lightweight data movement libraries and enjoy the best of ingestion + orchestration
    07:57 Example Architectures your CFO will love
    09:44 Embedded ELT demo
    15:06 Future roadmap
    Try Dagster today with a 30-day free trial: dagster.io/lp/dagster-cloud-t...
  • НаукаНаука

Комментарии • 9

  • @colton.padden
    @colton.padden 7 месяцев назад +4

    TIL of Sling - thanks Pedram!

    • @ingenieroriquelmecagardomo4067
      @ingenieroriquelmecagardomo4067 7 месяцев назад +2

      dlt is better. light years beyond sling imo, and they have a better and more commited and active team. dlt is like the dagster of the lightweight integration libraries.

  • @JimRohn-u8c
    @JimRohn-u8c 7 месяцев назад +2

    Does Dagster work on premises as well as the cloud?
    Are any features on cloud not available on premises?
    My company is moving back to on premises and doesn’t want to use SSIS.

    • @dagsterio
      @dagsterio  7 месяцев назад +1

      Hi Joshi. Dagster is an open-source project and all the capabilities that Pedram describes in this video are available in the open-source solution, which you can self-host locally or on-prem. Dagster Cloud offers additional capabilities, which are detailed at dagster.io/cloud or on the pricing page here: dagster.io/pricing

  • @AbhishekAgrawal-dv1id
    @AbhishekAgrawal-dv1id 2 месяца назад

    If the requirement is to get the data from S3 files into a BQ table but perform some validations on those files before inserting into the table, how would we do it with Embedded ELT? We are using Dagster OSS heavily and looking to use embedded-elt for getting data from files, tables and APIs..

    • @tim-at-elementl
      @tim-at-elementl 2 месяца назад +1

      Hey Abishek! In your case, would you be able to represent the S3 files as source assets first, adding asset checks onto those, and running Embedded ELT only if those asset checks pass? Sling currently (afaik) is heavily focused on doing ingestion well, so you can defer to the rest of the Dagster ecosystem (such as asset checks) for validations.

    • @AbhishekAgrawal-dv1id
      @AbhishekAgrawal-dv1id 2 месяца назад

      @@tim-at-elementl Thanks, Tim. How would you rate dlt for my use-case? I see dlt is far more mature..

    • @tim-at-elementl
      @tim-at-elementl 2 месяца назад

      ​@@AbhishekAgrawal-dv1id we've found that dlt is a powerful framework for ingesting from APIs and it's definitely mature enough for production settings. I'll also say that neither Sling's or dlt's integration currently allow for creating asset checks in-flight during ingestion.
      Instead, have you thought about ingesting the files into a quarantined dataset first using whichever tool you'd like, applying asset checks to that, and then moving that data to your real "analytics-ready" BQ datasets once you've vetted the data? This way, you can do ad hoc analysis to understand why the data failed data quality tests easily, but also keep it isolated from your production analytics.

    • @AbhishekAgrawal-dv1id
      @AbhishekAgrawal-dv1id 2 месяца назад

      Yeah, I am also leaning towards doing something like this. Thanks for this, Tim.
      Would you suggest using a similar approach to pull data from a different database? We'd still need to run minor validations on the incoming data, though. Would dlt help here at all?