Astronomer
Astronomer
  • Видео 187
  • Просмотров 433 319
Quickstart ETL with Airflow (Step 9 of 9)
Sign up for a free Astro trial!
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?)
See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos).
Просмотров: 17

Видео

Quickstart ETL with Airflow (Step 8 of 9)
Просмотров 132 часа назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) CLI commands: psql -p 5434 -U postgres -d airflow_db (this will vary based on your Postgres setup) SELECT * FROM weather_data.sunset_table; See this GitHub repository for the full code (github...
Quickstart ETL with Airflow (Step 7 of 9)
Просмотров 274 часа назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid7_code.py).
Exploring the Power of Airflow 3 at Astronomer with Amogh Desai
Просмотров 977 часов назад
What does it take to go from fixing a broken link to becoming a committer for one of the world’s leading open-source projects? Amogh Desai, Senior Software Engineer at Astronomer, takes us through his journey with Apache Airflow. From small contributions to building meaningful connections in the open-source community, Amogh’s story provides actionable insights for anyone on the cusp of their op...
Quickstart ETL with Airflow (Step 6 of 9)
Просмотров 267 часов назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full DAG code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid6_code.py) and SQL statement (github.com/astronomer/quickstart-e...
Quickstart ETL with Airflow (Step 5 of 9)
Просмотров 209 часов назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) Connection UI fields: Connection ID: my_postgres_conn Connection Type: Postgres Host: your Postgres host Database: your Postgres database Login: your Postgres login Password: your Postgres pas...
Quickstart ETL with Airflow (Step 4 of 9)
Просмотров 3112 часов назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid4_code.py).
Quickstart ETL with Airflow (Step 3 of 9)
Просмотров 3814 часов назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid3_code.py).
Quickstart ETL with Airflow (Step 2 of 9)
Просмотров 4916 часов назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?). See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid2_code.py).
Quickstart ETL with Airflow (Step 1 of 9)
Просмотров 11319 часов назад
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) Commands and code from the video: docker ps brew install astro astro dev init astro dev start Log in at localhost:8080 with admin for the username and password
Using Airflow To Power Machine Learning Pipelines at Optimove with Vasyl Vasyuta
Просмотров 76День назад
Data orchestration and machine learning are shaping how organizations handle massive datasets and drive customer-focused strategies. Tools like Apache Airflow are central to this transformation. In this episode, Vasyl Vasyuta, R&D Team Leader at Optimove, joins us to discuss how his team leverages Airflow to optimize data processing, orchestrate machine learning models and create personalized c...
Maximizing Business Impact Through Data at GlossGenius with Katie Bauer
Просмотров 7414 дней назад
Bridging the gap between data teams and business priorities is essential for maximizing impact and building value-driven workflows. Katie Bauer, Senior Director of Data at GlossGenius, joins us to share her principles for creating effective, aligned data teams. In this episode, Katie draws from her experience at GlossGenius, Reddit and Twitter to highlight the common pitfalls data teams face an...
Optimizing Large-Scale Deployments at LinkedIn with Rahul Gade
Просмотров 8521 день назад
Scaling deployments for a billion users demands innovation, precision and resilience. In this episode, we dive into how LinkedIn optimizes its continuous deployment process using Apache Airflow. Rahul Gade, Staff Software Engineer at LinkedIn, shares his insights on building scalable systems and democratizing deployments for over 10,000 engineers. Rahul discusses the challenges of managing larg...
How Uber Manages 1 Million Daily Tasks Using Airflow, with Shobhit Shah and Sumit Maheshwari
Просмотров 106Месяц назад
When data orchestration reaches Uber’s scale, innovation becomes a necessity, not a luxury. In this episode, we discuss the innovations behind Uber’s unique Airflow setup. With our guests Shobhit Shah and Sumit Maheshwari, both Staff Software Engineers at Uber, we explore how their team manages one of the largest data workflow systems in the world. Shobhit and Sumit walk us through the evolutio...
Building Resilient Data Systems for Modern Enterprises at Astrafy with Andrea Bombino
Просмотров 96Месяц назад
Efficient data orchestration is the backbone of modern analytics and AI-driven workflows. Without the right tools, even the best data can fall short of its potential. In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, shares insights into his team’s approach to optimizing data transformation and orchestration using tools like datasets and Pub/Sub to drive ...
Introduction to Data Products
Просмотров 116Месяц назад
Introduction to Data Products
How to use SLAs for Data Pipelines
Просмотров 84Месяц назад
How to use SLAs for Data Pipelines
Actionable Pipeline Insights with Astro Observe
Просмотров 109Месяц назад
Actionable Pipeline Insights with Astro Observe
Inside Airflow 3: Redefining Data Engineering with Vikram Koka
Просмотров 174Месяц назад
Inside Airflow 3: Redefining Data Engineering with Vikram Koka
Building a Data-Driven HR Platform at 15Five with Guy Dassa
Просмотров 702 месяца назад
Building a Data-Driven HR Platform at 15Five with Guy Dassa
The Intersection of AI and Data Management at Dosu with Devin Stein
Просмотров 1082 месяца назад
The Intersection of AI and Data Management at Dosu with Devin Stein
AI-Powered Vehicle Automation at Ford Motor Company with Serjesh Sharma
Просмотров 1443 месяца назад
AI-Powered Vehicle Automation at Ford Motor Company with Serjesh Sharma
From Task Failures to Operational Excellence at GumGum with Brendan Frick
Просмотров 1373 месяца назад
From Task Failures to Operational Excellence at GumGum with Brendan Frick
Building Modern Data Apps:Choosing the Right Foundation and Tools
Просмотров 843 месяца назад
Building Modern Data Apps:Choosing the Right Foundation and Tools
From Sensors to Datasets: Enhancing Airflow at Astronomer with Maggie Stark and Marion Azoulai
Просмотров 1453 месяца назад
From Sensors to Datasets: Enhancing Airflow at Astronomer with Maggie Stark and Marion Azoulai
Mastering Data Orchestration with Airflow at M Science with Ben Tallman
Просмотров 1233 месяца назад
Mastering Data Orchestration with Airflow at M Science with Ben Tallman
Welcome to The Data Flowcast
Просмотров 904 месяца назад
Welcome to The Data Flowcast
Enhancing Business Metrics With Airflow at Artlist with Hannan Kravitz
Просмотров 1004 месяца назад
Enhancing Business Metrics With Airflow at Artlist with Hannan Kravitz
Cutting-Edge Data Engineering at Teya with Alexandre Magno Lima Martins
Просмотров 4354 месяца назад
Cutting-Edge Data Engineering at Teya with Alexandre Magno Lima Martins
Airflow Strategies for Business Efficiency at Campbell with Larry Komenda
Просмотров 7644 месяца назад
Airflow Strategies for Business Efficiency at Campbell with Larry Komenda

Комментарии

  • @walterppk1989
    @walterppk1989 2 дня назад

    The title is misleading. This video is not about aieflow 3. It's about an individual contributors journey to becoming an airflow contributor. That's cool, but not what I came here for. Please do better in the future.

  • @likithb3726
    @likithb3726 4 дня назад

    ma'am when i run the command astro dev start i get the following error Error: error building, (re)creating or starting project containers: Error response from daemon: error while creating mount source path '/host_mnt/Users/Bingumalla Likith/Desktop/MLOPS/airflow-astro/dags': mkdir /host_mnt/Users/Bingumalla Likith/Desktop: operation not permitted can you help me out with it. Im using mac.

  • @marceloribeiro2548
    @marceloribeiro2548 20 дней назад

    how about the logs ?

  • @dhruvtyagi6118
    @dhruvtyagi6118 28 дней назад

    I am able to install astro but getting access denied error when using astro dev init or any astro command

  • @Klifhunger
    @Klifhunger Месяц назад

    Insightful 🙏

  • @mranderson7306
    @mranderson7306 Месяц назад

    ​ @Astronomer, Hello! Could you please tell me how you open the .html documentation that is generated inside the Airflow Docker container through the web interface? When I navigate to "data_docs_url": file:///opt/airflow/gx/uncommitted/data_docs/local_site/index.html," I get a 404 error.

  • @MariaMartin-q8d
    @MariaMartin-q8d 2 месяца назад

    Gonzalez Betty White Scott Anderson Jennifer

  • @MariaMartin-q8d
    @MariaMartin-q8d 2 месяца назад

    Thompson Helen Martinez Helen Lee Laura

  • @MariaMartin-q8d
    @MariaMartin-q8d 2 месяца назад

    Hernandez Brian Lewis Angela Clark Thomas

  • @shadabbigdel5017
    @shadabbigdel5017 2 месяца назад

    The issue with the KubernetesExecutor is that you cannot view the task logs in the Airflow UI because, with KubernetesExecutor, workers are terminated after their job finishes. This issue is not present with the Celery or CeleryKubernetesExecutor. I tried different solutions with Persistent Volumes (PV) and Persistent Volume Claims (PVC), but they didn’t work for me. At the end of the video, Marc also presented the issue, but no solution was provided. Does anyone here know how to resolve it?

    • @Astronomer
      @Astronomer 2 месяца назад

      Hey there, thanks for commenting. It is absolutely possible to get task logs in the Airflow UI when using K8s Executor. If you're working with OSS Airflow, you will need to either enable remote logging so Airflow grabs the logs before the pod spins down, or use a persistent volume to store them. With Astronomer, this is all handled automatically in our Astro Runtime. I'd recommend reading more in the docs here: airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html

  • @aditya234567
    @aditya234567 3 месяца назад

    Say I have a dag run happening and dynamically I updated the dag tasks.. Will it break the existing dag run? Say that particular dag run has 10 tasks to do, and I update dag when it's doing 1st task. Will it implement old tasks and newly run dag run implement new updated tasks??

  • @PaulDeveaux
    @PaulDeveaux 3 месяца назад

    What I would love to see is how to do this when the dbt and Airflow are in separate repositories. Due to different dependencies between airflow and dbt, this seems like a common use case.

  • @ricardomagnomartins
    @ricardomagnomartins 4 месяца назад

    Que bacana, meu filho!!! Parabéns.

  • @ramdasvk0716
    @ramdasvk0716 4 месяца назад

    The best❤️‍🔥

  • @deborathaisrodriguesdelima5866
    @deborathaisrodriguesdelima5866 4 месяца назад

    Excelente 👏👏

  • @ledinhanhtan
    @ledinhanhtan 5 месяцев назад

    Hi, will the dag.test() applied for complex tasks such as SparkSubmitOperator() ? 🙏

  • @goodmanshawnhuang
    @goodmanshawnhuang 5 месяцев назад

    Great job, thanks for sharing it.

  • @marcin2x4
    @marcin2x4 5 месяцев назад

    Are presented examples available in any code repo?

  • @shadabbigdel5017
    @shadabbigdel5017 5 месяцев назад

    Thank you very much for the great presentation and hands-on session. We are going to use Airflow in EKS, and our development Team needed a way to simulate their local environment to test their DAGs during development and become familiar with airflow on Kubernetes. Your guide was extremely helpful.

  • @bryanpolito8576
    @bryanpolito8576 6 месяцев назад

    Gracias

  • @AnchalGupta-ek3wr
    @AnchalGupta-ek3wr 6 месяцев назад

    After adding the python file and html file, and restarting the web server plugin details are visible in Admin > Plugins path. But the View is not populating in cloud composer. Is there anything else need to be performed?

  • @AnchalGupta-ek3wr
    @AnchalGupta-ek3wr 6 месяцев назад

    After adding the python file and html file, and restarting the web server and postgres from docker. But the View is not populating in my local airflow, Is there anything else need to be performed? Running airflow from docker setup, My airflow version is 1.10.15, pretty old, but can't switch to newer version right now

  • @ap2394
    @ap2394 6 месяцев назад

    HI Is it possible to schedule the Task using dataset ? or its controlled at Dag level. I mean if i hv 2 task in downstream Dag , do I hv option to customised the schedule on the basis of Task's upstream dataset

  • @spikeydude114
    @spikeydude114 6 месяцев назад

    Do you have LinkedIn?

  • @pgrvloik
    @pgrvloik 6 месяцев назад

    Great!

  • @rkenne1391
    @rkenne1391 7 месяцев назад

    Can you provide more context on the batch inference pipeline ? Airflow is an orchestrator, you will need a different framework to perform batch inference ?

  • @snehal4520
    @snehal4520 7 месяцев назад

    Very informative, thank you!

  • @amirhosseinsharifinejad7752
    @amirhosseinsharifinejad7752 7 месяцев назад

    Really helpful thank you😍

  • @PaulChung-rg6jv
    @PaulChung-rg6jv 7 месяцев назад

    Tons of information. Any chance this can be thrown in a github for us engineers who need more time to digest?

  • @munyaradzimagodo3983
    @munyaradzimagodo3983 7 месяцев назад

    thank you, well explained. Created an express application to create DAGs programatically but the endpoints are not working

  • @CarbonsHDTuts
    @CarbonsHDTuts 7 месяцев назад

    This is really awesome and I love the entire video and always love content from you guys and girls but could I please give some constructive feedback?

  • @mettuvamshidhar1389
    @mettuvamshidhar1389 8 месяцев назад

    Is it possible to get the list of variables pushed through xcom push in first task (here extracting lets say) And can we pull that varibales list xcom_pull and have it as a group Dynamically (instead of A, B, C)??

  • @bilalmsd07
    @bilalmsd07 8 месяцев назад

    what about if any of the subtasks fails ? how to trigger the error than but also the remining parallel tasks to be run.

  • @yevgenym9204
    @yevgenym9204 8 месяцев назад

    @Astronomer Please share a direct link to the CLI library you mention (for proper files strcuture) ruclips.net/video/zVzBVpbgw1A/видео.htmlsi=HiJa9Afi-53yLZOG&t=873

    • @Astronomer
      @Astronomer 8 месяцев назад

      You can find documentation on the Astro CLI, including download instructions, here: docs.astronomer.io/astro/cli/overview

  • @rohitnath5545
    @rohitnath5545 8 месяцев назад

    Do we have a video on how to run airflow using docker on cloud containers. Running locally is fine to learn and test. But the real work is to see how on cloud. Am a consultant and for my clients easier setup is the goal. With airflow i dont see that

    • @Astronomer
      @Astronomer 8 месяцев назад

      Astronomer provides a managed service for running Airflow at scale and in the cloud. You can learn more at astronomer.io/try-astro

  • @marehmanmarehman9431
    @marehmanmarehman9431 8 месяцев назад

    great work, keep it up.

  • @ryank8463
    @ryank8463 8 месяцев назад

    Hi, this video is really beneficial. I have some question about the best practive of handling data transmission btw tasks. I am building MLops using airflow. In my model training dag, it contains data preprocess-> model training. So there would be massive data transmission btw this 2 dags. I am using Xcom to transmit data btw them. But there's like a 2G limitation in Xcom. So what's the best practice to deal with this problem? Using a S3 to sned/pull data from tasks? Or should I simply combine these 2 tasks(data preprocess-> model training)? Thank you.

    • @Astronomer
      @Astronomer 8 месяцев назад

      Thank you! For passing larger amounts of data between tasks you have two main options: a custom XCom backend or writing to intermediary storage directly from within the tasks. In general we recommend a custom XCom backend as a best practice in these situations, because you can keep your DAG code the same, the change happens in how the data sent to and retrieved from XCom is processed. You can find a tutorial on how to set up a custom XCom backend here: docs.astronomer.io/learn/xcom-backend-tutorial. Merging the tasks is generally not recommended because it makes it harder to get observability and rerun individual actions.

    • @ryank8463
      @ryank8463 8 месяцев назад

      @@Astronomer Hi, Thanks for your valuable reply. I would also like to ask what level of granularity should we aim for when allocating tasks. Since the more tasks there are, the more push/pull data from the external storage happens, and when the data is large, it brings some level of network overhead.

  • @christianfernandez5717
    @christianfernandez5717 8 месяцев назад

    Great video. Would also be interested in a webinar regarding scaling the Airflow database since I'm having some difficulties of my own with that.

    • @Astronomer
      @Astronomer 8 месяцев назад

      Noted, thanks for the suggestion! If it's helpful, you can check out our guide on the metadata db docs.astronomer.io/learn/airflow-database. Using a managed service like Astro is also one way many companies avoid scaling issues with Airflow.

  • @dan-takacs
    @dan-takacs 9 месяцев назад

    great video. I'm trying to make this work with LivyOperator do you know if it can be expanded or partial arguments supplied to it?

    • @Astronomer
      @Astronomer 9 месяцев назад

      It should work. Generally you can map over any type of operator, but not that some parameters can't be mapped over (e.g. BaseOperator params). More here: docs.astronomer.io/learn/dynamic-tasks

  • @looklook6075
    @looklook6075 9 месяцев назад

    32:29 why "test' connection button is disabled. SO frustrating. Aifrflow makes it so hard to connect to anything. Not intuitive at all. And your video just skipped on how to enable "test". And ask me to contact my deployment admin. lol, I am the deployment admin. Can you show me how? I checked its website and the documentation is not helpful at all. I have been stuck for over a week on how to connect airflow to an MSSQL Sever.

    • @Astronomer
      @Astronomer 9 месяцев назад

      The `test` connection button is disabled by default starting in Airflow 2.7 for security reasons. You can enable it by setting the test_connection core config to Enabled. docs.astronomer.io/learn/connections#test-a-connection. We also have some guidance on connecting to an MSSQL server, although the process can vary depending on your exact setup: docs.astronomer.io/learn/connections/ms-sqlserver

    • @quintonflorence6492
      @quintonflorence6492 8 месяцев назад

      @@Astronomer Hi, where can I find the core config to make this update? I'm currently using Astro CLI. I'm not seeing this setting in the two .yaml files in the project. Thank you.

  • @saritabasye5254
    @saritabasye5254 9 месяцев назад

    *promosm* 💔

  • @pichaibravo
    @pichaibravo 9 месяцев назад

    Is it good to return df many times in Airflow?

    • @Astronomer
      @Astronomer 9 месяцев назад

      It's generally fine to pass dataframes in between your Airflow tasks, as long as you make sure your infrastructure can support the size of your data. If you use XCom, it's a good idea to consider a custom XCom backend for managing dataframes as Airflow's metadata db isn't set up for this specifically.

  • @ziedsalhi4503
    @ziedsalhi4503 9 месяцев назад

    Hi, I have already an existing airflow project, so how can use Astro CLI to run my project ?

  • @greatotool
    @greatotool 10 месяцев назад

    is the git repository public?

    • @Astronomer
      @Astronomer 10 месяцев назад

      Yes! You can find it here: github.com/astronomer/webinar-demos/tree/best-practices-prod

    • @greatotool
      @greatotool 9 месяцев назад

      Thakns!!🙂@@Astronomer

  • @KirillP-b1v
    @KirillP-b1v 10 месяцев назад

    please, share repository

    • @Astronomer
      @Astronomer 10 месяцев назад

      The repo is here: github.com/astronomer/webinar-demos/tree/best-practices-prod

  • @mcpiatkowski
    @mcpiatkowski 10 месяцев назад

    That is great intro and overview of Airflow for beginners! I very much like the datasets concepts and the ability to see data lineage. However, I haven't found the solution for how to make a triggered pipe, that is dataset aware, to be executed with the parent dag execution date. Is it even possible at the moment?

    • @Astronomer
      @Astronomer 10 месяцев назад

      Thanks! And that is a great question. It is not possible to have the downstream Dataset-triggered DAG have the same logical_date (the new paramater equivalent to the old execution_date ) as the DAG that caused the update to the dataset, but it is possible to pull that date from the downstream DAG by accessing context["triggering_dataset_events"]: @task def print_triggering_dataset_events(**context): triggering_dataset_events = context["triggering_dataset_events"] for dataset, dataset_list in triggering_dataset_events.items(): print(dataset, dataset_list) print(dataset_list[0].source_dag_run.logical_date) print_triggering_dataset_events() If you use the above in your downstream DAG you can get that logical_date/execution_date to use in your Airflow tasks. For more info and an example with Jinja templating see: airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html#fetching-information-from-a-triggering-dataset-event .

    • @mcpiatkowski
      @mcpiatkowski 9 месяцев назад

      @@Astronomer That is amazing! You are my hero for life! Thank you!

  • @veereshk6065
    @veereshk6065 10 месяцев назад

    Hi, Thank you for detailed demo. I just started exploring dynamic task mapping and I have below requirement where I need to get the data from metadata table and create list of dictionary. [ { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, ] The above structure can be generated using fetch_metadata_task (combination of BigQueryHook and PythonOperator). Now the Question is, how do I generate the dynamic tasks using the above list of dictionary. for each dictionary I want to perform set of tasks ex:GCSToBigQueryOperator, BigQueryValueCheckOperator, BigQueryToBigQueryCopyOperator etc. The sample dag dependancy look like this: start_task >> fetch_metadata_task fetch_metadata_task >> [GCSToBigQueryOperator_table1 >> BigQueryValueCheckOperator_table1 >> BigQueryToBigQueryCopyOperator_table1 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table2 >> BigQueryValueCheckOperator_table2 >> BigQueryToBigQueryCopyOperator_table2 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table3 >> BigQueryValueCheckOperator_table3 >> BigQueryToBigQueryCopyOperator_table3 >> connecting_dummy_task ] connecting_dummy_task >> BigQueryExecuteTask >> end_task

  • @veereshk6065
    @veereshk6065 10 месяцев назад

    Hi All, Thank you for detailed demo. I just started exploring dynamic task mapping and I have below requirement where I need to get the data from metadata table and create list of dictionary. [ { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, ] The above structure can be generated using fetch_metadata_task (combination of BigQueryHook and PythonOperator). Now the Question is, how do I generate the dynamic tasks using the above list of dictionary. for each dictionary I want to perform set of tasks ex:GCSToBigQueryOperator, BigQueryValueCheckOperator, BigQueryToBigQueryCopyOperator etc. The sample dag dependancy look like this: start_task >> fetch_metadata_task fetch_metadata_task >> [GCSToBigQueryOperator_table1 >> BigQueryValueCheckOperator_table1 >> BigQueryToBigQueryCopyOperator_table1 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table2 >> BigQueryValueCheckOperator_table2 >> BigQueryToBigQueryCopyOperator_table2 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table3 >> BigQueryValueCheckOperator_table3 >> BigQueryToBigQueryCopyOperator_table3 >> connecting_dummy_task ] connecting_dummy_task >> BigQueryExecuteTask >> end_task

  • @ayushikhanna1094
    @ayushikhanna1094 10 месяцев назад

    Is there any option available in airflow ui to auto trigger.

  • @78salieri78
    @78salieri78 10 месяцев назад

    Great video, with many examples, much appreciated!