It would be really helpful if you covered some of the topics you mentioned in the end, especially dependency isolation, given python's dependency model.
I don’t know… this video is one year old, but still uses the legacy DAG syntax from Airflow 1, rather than the TaskFlow API from Airflow 2. So the syntax doesn’t make a difference anymore. Regarding the coupling to environment: Airflow has different executors. The KubernetesPodOperator is not the only way to run on a Kubernetes environment. The rest may or may not be true. Probably there are many things that Dagster does better than Airflow. But I’m disappointed that you would publish such a biased comparison.
Tried Dagster a few days back, liked it but have a weird need: some workloads that need to run in Typescript, Rust or Haskell as there is some parsing happening for which I don't have any python libs available atm. How would you solve this problem? I'm thinking of 1) hitting an external service that runs my code in whatever runtime it needs or 2) use Temporal which has a JavaScript/TypeScript SDK in addition to Python and Golang afaict. Curious to hear your characterization of the diff between Temporal and Dagster. Haven't done the deep-dive myself yet.
Hey David - we generally recommend using the dagster-shell library to invoke code in other languages inside Dagster pipelines. A couple high-level differences between Dagster and Temporal: - Dagster is focused on data pipelines, while Temporal is more focused on application-related workflows - Dagster involves declaring the target state up-front, while I believe Temporal is more dynamic
If you look at the most recent version of airflow, it also has decorators and DBT support , in the other hand Apache airflow is free. Nice try on comparing Dagster 🙃 with Airflow 🙂, but I but I'm sticking with AIRFLOW.
Yes. I'm not saying this video is wrong or that I prefer Airflow, myself BUT after surveying - Mage - Prefect - Dagster - AWS StepFunctions + EventBridge I've found that all vendors seem to be reacting to Airflow 1.0. "We struggled with Airflow 1.0 so we built our own orchestrator product." The Airflow 2.0 seem to have rebuttals to most of the pain points I personally faced when maintaining Airflow 1.0 on Kubernetes back in 2019. I'm still open to using a different orchestration tool after my experience, but I need to gather accurate information about the *current* state of the space before making that kind of long term decision.
Can Dagster be used to orchestrate a spark streaming YARN job that pulls data from Kafka and writes to HDFS?.. the idea is if the spark streaming job queues and it can be monitored/alerted/detected and restarted automatically by Dagster? or would Airflow be the right tool for this?
I don't understand the meaning of can run only in production... As if you could not have an instance pointing to non production environments and another pointing to production environments and manage the version of your code with any git tool. :\
Hey Giuseppe - yes, you can stand up an Airflow instance inside a non-production environment. However, the programming model encourages you to write DAGs in a way that binds them to particular environments, and Airflow is heavy-weight in a way that makes it difficult to use as part of a local development workflow.
@@s_ryz Why wouldn't one just parameterize those as variables? There's no way airflow encourages you to not use variables and hardcode stuff for production. Maybe you have a proper example explaining your point? Otherwise, its just that you are commenting without understanding best practices of airfllow
@@kalyanben10 He gave a good example. If your Airflow ETL runs in a Kubernetes cluster in prod, the only way to test it locally would be to run the entire cluster on your host. With Dagster, your pipeline is decoupled from it's runtime environment so you would be able to test the same pipeline within the python process of your machine for example
It would be really helpful if you covered some of the topics you mentioned in the end, especially dependency isolation, given python's dependency model.
I don’t know… this video is one year old, but still uses the legacy DAG syntax from Airflow 1, rather than the TaskFlow API from Airflow 2. So the syntax doesn’t make a difference anymore.
Regarding the coupling to environment: Airflow has different executors. The KubernetesPodOperator is not the only way to run on a Kubernetes environment.
The rest may or may not be true. Probably there are many things that Dagster does better than Airflow. But I’m disappointed that you would publish such a biased comparison.
Tried Dagster a few days back, liked it but have a weird need: some workloads that need to run in Typescript, Rust or Haskell as there is some parsing happening for which I don't have any python libs available atm.
How would you solve this problem?
I'm thinking of 1) hitting an external service that runs my code in whatever runtime it needs or 2) use Temporal which has a JavaScript/TypeScript SDK in addition to Python and Golang afaict.
Curious to hear your characterization of the diff between Temporal and Dagster. Haven't done the deep-dive myself yet.
Hey David - we generally recommend using the dagster-shell library to invoke code in other languages inside Dagster pipelines.
A couple high-level differences between Dagster and Temporal:
- Dagster is focused on data pipelines, while Temporal is more focused on application-related workflows
- Dagster involves declaring the target state up-front, while I believe Temporal is more dynamic
If you look at the most recent version of airflow, it also has decorators and DBT support , in the other hand Apache airflow is free. Nice try on comparing Dagster 🙃 with Airflow 🙂, but I but I'm sticking with AIRFLOW.
Yes. I'm not saying this video is wrong or that I prefer Airflow, myself BUT after surveying
- Mage
- Prefect
- Dagster
- AWS StepFunctions + EventBridge
I've found that all vendors seem to be reacting to Airflow 1.0. "We struggled with Airflow 1.0 so we built our own orchestrator product."
The Airflow 2.0 seem to have rebuttals to most of the pain points I personally faced when maintaining Airflow 1.0 on Kubernetes back in 2019.
I'm still open to using a different orchestration tool after my experience, but I need to gather accurate information about the *current* state of the space before making that kind of long term decision.
Can Dagster be used to orchestrate a spark streaming YARN job that pulls data from Kafka and writes to HDFS?.. the idea is if the spark streaming job queues and it can be monitored/alerted/detected and restarted automatically by Dagster? or would Airflow be the right tool for this?
I don't understand the meaning of can run only in production... As if you could not have an instance pointing to non production environments and another pointing to production environments and manage the version of your code with any git tool. :\
Hey Giuseppe - yes, you can stand up an Airflow instance inside a non-production environment. However, the programming model encourages you to write DAGs in a way that binds them to particular environments, and Airflow is heavy-weight in a way that makes it difficult to use as part of a local development workflow.
@@s_ryz Why wouldn't one just parameterize those as variables? There's no way airflow encourages you to not use variables and hardcode stuff for production. Maybe you have a proper example explaining your point? Otherwise, its just that you are commenting without understanding best practices of airfllow
@@kalyanben10 He gave a good example. If your Airflow ETL runs in a Kubernetes cluster in prod, the only way to test it locally would be to run the entire cluster on your host. With Dagster, your pipeline is decoupled from it's runtime environment so you would be able to test the same pipeline within the python process of your machine for example
All wrong claims: low developer productivity, catch errors in production, poor visibility.
I totally agree. With TaskFlow it is easy possible to achive the same.
это настолько поверхностное и лукавое сравнение что я даже не хочу писать комментарий на английском🤦♂
I'm fairly sure this sales guy never used Airflow
Nice Try;)