Apache Airflow
Apache Airflow
  • Видео 351
  • Просмотров 957 257
Hello Quality: Building CIs to run providers packages system tests
Presented by Freddy Demiane, Rahul Vats & Dennis Ferruzzi at Airflow Summit 2024.
Airflow operators are a core feature of Apache Airflow and it’s extremely important that we maintain high quality of operators, prevent regressions and on the other hand we help developers with automated tests results to double check if introduced changes don’t cause regressions or backward incompatible changes and we provide Airflow release managers with information whether a given version of a provider should be released or not yet.
Recently a new approach to assuring production quality was implemented for AWS, Google and Astronomer-provided operators - standalone Continuous Integration processes were config...
Просмотров: 29

Видео

How the Airflow Community Productionizes AI
Просмотров 462 часа назад
Presented by Pete DeJoy at Airflow Summit 2024. Every data team out there is being asked from their business stakeholders about Generative AI. Taking LLM centric workloads to production is not a trivial task. At the foundational level, there are a set of challenges around data delivery, data quality, and data ingestion that mirror traditional data engineering problems. Once you’re past those, t...
Refactoring DAGs: From duplication to delightful efficiency with a centralized library
Просмотров 2032 часа назад
Presented by Gil Reich at Airflow Summit 2024. Feeling trapped in a maze of duplicate Airflow DAG code? We were too! That’s why we embarked on a journey to build a centralized library, eliminating redundancy and unlocking delightful efficiency. Join us as we share: - The struggles of managing repetitive code across DAGs - Our approach to a centralized library, revealing design and implementatio...
Customizing LLMs: Leveraging technology to tailor GenAI using Airflow
Просмотров 182 часа назад
Presented by Vincent La, Jim Howard & Moulay Zaidane Al Bahi Draidia at Airflow Summit 2024. Laurel provides an AI-driven timekeeping solution tailored for accounting and legal firms, automating timesheet creation by capturing digital work activities. This session highlights two notable AI projects: 1. UTBMS Code Prediction: Leveraging small language models, this system builds new embeddings to...
Using Airflow Operational Data to Optimize Cloud Services
Просмотров 512 часа назад
Presented by Olivier Daneau at Airflow Summit 2024. Cost management is a continuous challenge for our data teams at Astronomer. Understanding the expenses associated with running our workflows is not always straightforward, and identifying which process ran a query causing unexpected usage on a given day can be time-consuming. In this talk, we will showcase an Airflow Plugin and specific DAGs d...
Empowering Business Analysts with DAG Authoring IDE running 8000 workflows
Просмотров 172 часа назад
Presented by Daniil Dubin at Airflow Summit 2024. At Wix more often than not business analysts build workflows themselves to avoid data engineers being a bottleneck. But how do you enable them to create SQL ETLs starting when dependencies are ready and sending emails or refreshing Tableau reports when the work is done? One simple answer may be to use Airflow. The problem is every BA cannot be e...
AIP 63: DAG versioning, where are we?
Просмотров 152 часа назад
Presented by Jed Cunningham at Airflow Summit 2024. Join us as we check in on the current status of AIP-63: DAG Versioning. This session will explore the motivations behind AIP-63, the challenges faced by Airflow users in understanding and managing DAG history, and how it aims to address them. From tracking TaskInstance history to improving DAG representation in the UI, we’ll examine what we’ve...
Seeing Clearly with Airflow: The shift to data-aware orchestration
Просмотров 192 часа назад
Presented by Constance Martineau & Tzu-ping Chung at Airflow Summit 2024. As Apache Airflow evolves, a key shift is emerging: the move from task-centric to data-aware orchestration. Traditionally, Airflow has focused on managing tasks efficiently, with limited visibility into the data those tasks manipulate. However, the rise of data-centric workflows demands a new approach-one that puts data a...
Gen AI using Airflow 3: A vision for Airflow RAGs
Просмотров 272 часа назад
Presented by Kaxil Naik & Ash Berlin-Taylor at Airflow Summit 2024. Gen AI has taken the computing world by storm. As Enterprises and Startups have started to experiment with LLM applications, it has become clear that providing the right context to these LLM applications is critical. This process known as Retrieval augmented generation (RAG) relies on adding custom data to the large language mo...
DAGify: Enterprise schedule migration accelerator for Airflow
Просмотров 132 часа назад
Presented by Konrad Schieban at Airflow Summit 2024. DAGify is a highly extensible, template driven, enterprise scheduler migration accelerator that helps organizations speed up their migration to Apache Airflow. While DAGify does not claim to migrate 100% of existing scheduler functionality it aims to heavily reduce the manual effort it takes for developers to convert their enterprise schedule...
The Silent Symphony: Keeping Airflow's CI/CD and Dev Tools in Tune
Просмотров 142 часа назад
Presented by Jarek Potiuk at Airflow Summit 2024. Apache Airflow relies on a silent symphony behind the scenes: its CI/CD (Continuous Integration/Continuous Delivery) and development tooling. This presentation explores the critical role these tools play in keeping Airflow efficient and innovative. We’ll delve into how robust CI/CD ensures bug fixes and improvements are seamlessly integrated, wh...
Lessons from the Ecosystem: What can Airflow Learn from Other Open-source Communities?
Просмотров 32 часа назад
Presented by Michael Robinson at Airflow Summit 2024. The Apache Airflow community is so large and active that it’s tempting to take the view that “if it ain’t broke don’t fix it.” In a community as in a codebase, however, improvement and attention are essential to sustaining growth. And bugs are just as inevitable in community management as they are in software development. If only the fixes w...
Scalable Development of Event Driven Airflow DAGs
Просмотров 212 часа назад
Presented by Ipsa Trivedi & Subramanian Vellaiyan at Airflow Summit 2024. This use case shows how we deal with data of different varieties from different sources. Each source sends data in different layout, timings, structures, location patterns sizes. The goal is to process the files within SLA and send them out. This a complex multi step processing pipeline that involves multiple spark jobs, ...
Empowing Airflow Users: A framework for performance testing and transparent resource optimization
Просмотров 162 часа назад
Presented by Bartosz Jankiewicz at Airflow Summit 2024. Apache Airflow is the backbone of countless data pipelines, but optimizing performance and resource utilization can be a challenge. This talk introduces a novel performance testing framework designed to measure, monitor, and improve the efficiency of Airflow deployments. I’ll delve into the framework’s modular architecture, showcasing how ...
Scale and Security: How Autodesk securely develops and tests PII pipelines with Airflow
Просмотров 72 часа назад
Presented by Bhavesh Jaisinghani at Airflow Summit 2024. In today’s data-driven era, ensuring data reliability and enhancing our testing and development capabilities are paramount. Local unit testing has its merits but falls short when dealing with the volume of big data. One major challenge is running Spark jobs pre-deployment to ensure they produce expected results and handle production-level...
Connecting the Dots in Airflow
Просмотров 162 часа назад
Connecting the Dots in Airflow
Adaptive Memory Scaling for Robust Airflow Pipelines
Просмотров 102 часа назад
Adaptive Memory Scaling for Robust Airflow Pipelines
How we use Airflow at Booking to Orchestrate Big Data workflows
Просмотров 492 часа назад
How we use Airflow at Booking to Orchestrate Big Data workflows
Unlocking the Power of Airflow Beyond Data Engineering at Cloudflare
Просмотров 82 часа назад
Unlocking the Power of Airflow Beyond Data Engineering at Cloudflare
Mastering Advanced Dataset Scheduling in Apache Airflow
Просмотров 122 часа назад
Mastering Advanced Dataset Scheduling in Apache Airflow
Simplified User Management in Airflow
Просмотров 192 часа назад
Simplified User Management in Airflow
OpenLineage: From operators to hooks
Просмотров 142 часа назад
OpenLineage: From operators to hooks
What if? Running Airflow tasks without workers
Просмотров 52 часа назад
What if? Running Airflow tasks without workers
Hybrid Executors: Have your cake and it eat too
Просмотров 102 часа назад
Hybrid Executors: Have your cake and it eat too
The Essentials of Custom Executor Development
Просмотров 142 часа назад
The Essentials of Custom Executor Development
Why do Airflow tasks fail?
Просмотров 222 часа назад
Why do Airflow tasks fail?
A New DAG Paradigm: Less Airflow more DAGs
Просмотров 122 часа назад
A New DAG Paradigm: Less Airflow more DAGs
Behaviour Driven Development in Airflow
Просмотров 192 часа назад
Behaviour Driven Development in Airflow
Airflow as a workflow for self service based ingestion
Просмотров 312 часа назад
Airflow as a workflow for self service based ingestion
How we run 100 Airflow environments and millions of tasks as a part time job using Kubernete
Просмотров 932 часа назад
How we run 100 Airflow environments and millions of tasks as a part time job using Kubernete

Комментарии

  • @ddyoder
    @ddyoder 2 часа назад

    "The worst thing about technical debt is that its repaid in compounding interest" 1:57

  • @gunhanoral9434
    @gunhanoral9434 День назад

    Executor source QR code link leads to a 404 on github.

  • @juvewan
    @juvewan День назад

    Thanks for sharing the framework 🎉

  • @ethanstone5138
    @ethanstone5138 День назад

    this is my favorite famous person on all of youtube

  • @ethanstone5138
    @ethanstone5138 День назад

    i love this guy!

  • @juvewan
    @juvewan 12 дней назад

    What Airflow version are you using? Or is it the GCP managed airflow?

  •  20 дней назад

    Hey everyone, great demo! I really like the idea of covering all possible scenarios. Is the example code available in a repository somewhere?

  • @pranaygawas709
    @pranaygawas709 23 дня назад

    Can we access the code base used in this presentation anywhere?

  • @veereshk6065
    @veereshk6065 24 дня назад

    Can you share the dag files used in these sessions please?

  • @whemmakatatt5311
    @whemmakatatt5311 27 дней назад

    Making a wish to get a 24h version of this talk. Today could be my birthday 🎂 think about it Ethan

  • @whemmakatatt5311
    @whemmakatatt5311 27 дней назад

    I don't have a PhD on airflow. And this feels like only half or less of the talk😅 is it me?

  • @PhilippeGrohrock
    @PhilippeGrohrock 27 дней назад

    Been using EDA for close to 10y now and I was wondering how to do it correctly in AirFlow and this presentation really helped me a lot. Thanks! :)

  • @danhorus
    @danhorus 28 дней назад

    6:36 interesting. So the official docker image uses version numbers that look like sem ver, but only in relation to part of the image. Considering the principle of least astonishment, that's actually worse than using an incremental number or a date

  • @whemmakatatt5311
    @whemmakatatt5311 28 дней назад

    Thanks a lot Bonnie! Great content! Learnt a lot from watching and will continue to learn when rewatching, feel free to share in more detail your knowledge you explain it so well :D

  • @MarcLamberti
    @MarcLamberti Месяц назад

    Another amazing presentation!

  • @whemmakatatt5311
    @whemmakatatt5311 Месяц назад

    A bit unfortunate this content isnt liked by the youtube algo that much. Pay up or something xD Or make shock faces for thumbnails editions 😂 Great content! I feel lucky to have found this🎉

  • @DavidManouchehri
    @DavidManouchehri Месяц назад

    One of the best production summaries I’ve seen, thanks!

  • @thumbox
    @thumbox Месяц назад

    "it's easy to point out an issue, but better to provide a solution". This is absolutely true. It's painful when we need to interpret what the reviewers want to say.

  • @andyvandenberghe6364
    @andyvandenberghe6364 Месяц назад

    Is the usecase for reverse ETL still the same in 2024 ?

  • @Eriddoch
    @Eriddoch Месяц назад

    In my experience: Airflow prevents DS from iterating rapidly. And Metaflow enables that. But running 2 tools becomes a lot to manage (or pay for). The fact that you can get the DS DevEx of Metaflow, but run on top of Airflow is incredible!

  • @rembautimes8808
    @rembautimes8808 2 месяца назад

    Very innovative feature, customers will definitely appreciate it

  • @jacobogonzalezvargas9924
    @jacobogonzalezvargas9924 2 месяца назад

  • @hasancemreok2597
    @hasancemreok2597 3 месяца назад

    You really don’t want us to use custom transformations on AirByte, you put DBT to video’s title, you put it into the slide as a seperate page but you just use one little sentence about it in whole video. There’s nothing about what did you transform? How did you transform? Interesting. Btw, the video might be 2 years old but I have feelings quite new.

  • @avinashchavhanwebadict
    @avinashchavhanwebadict 3 месяца назад

    How skip the task group

  • @RickyZhang-p7k
    @RickyZhang-p7k 4 месяца назад

    There is no technical detail in the presentation.

  • @nikolaibarinov8660
    @nikolaibarinov8660 5 месяцев назад

    Why does a Kubernetes Pod operator require pairs of nodes? Pods != nodes

  • @sanjana2584
    @sanjana2584 5 месяцев назад

    hi i followed the first step and second step but in the upgrade command it was taking so much time,

  • @edithpuclla6188
    @edithpuclla6188 5 месяцев назад

    bravoo!!!

  • @talshor5198
    @talshor5198 6 месяцев назад

    Excellent presentation, thank you!

  • @fraternitas5117
    @fraternitas5117 6 месяцев назад

    make a sandwich.

  • @samiStarh
    @samiStarh 6 месяцев назад

    Impressionnant ! Bravo !

  • @archanareddy651
    @archanareddy651 6 месяцев назад

    How do we upgrade from 2.2.2 and latest is 2.7.2

  • @yemmey4ever
    @yemmey4ever 7 месяцев назад

    Thank you for all you do for the community! All of you!!👏👏👏

  • @SaraH-fd4si
    @SaraH-fd4si 7 месяцев назад

    anyone knows how he did the interactive filtering?

  • @alfahatasi
    @alfahatasi 8 месяцев назад

    I want to mask some data. How do you do this via superset?

  • @basamahmad2464
    @basamahmad2464 8 месяцев назад

    Can you install Superset on Windows Server 2019?

  • @lucasbraga2649
    @lucasbraga2649 8 месяцев назад

    I'm trying to install pip install etsy-dagtest with a new virtual environment in Python 3.9.12 and it's not working. Any ideas on how to solve it?

    • @DodaGarcia
      @DodaGarcia 6 месяцев назад

      From what I understand this is currently an internal tool, they're just showcasing the workflow.

  • @MorbidRotten
    @MorbidRotten 8 месяцев назад

    Nice content! Could you please provide the source code?

  • @tas9676
    @tas9676 8 месяцев назад

    What a gem! Thanks for sharing!!!!

  • @nataliamora8344
    @nataliamora8344 8 месяцев назад

    Has the project been published anywhere?

  • @MinhPhamCong-t5c
    @MinhPhamCong-t5c 8 месяцев назад

    Thank you!!!!

  • @danielbartley516
    @danielbartley516 9 месяцев назад

    On one level, this very cool. On another, since you can’t install the proprietary libraries this video gets your hopes up and then disappoints you.

  • @oklander1
    @oklander1 9 месяцев назад

    Zohar and Alina, great job!!!

  • @danielbartley516
    @danielbartley516 9 месяцев назад

    So many foot guns

  • @samplebricks234
    @samplebricks234 10 месяцев назад

    Airflowctl doesn‘t work on win10, does it?

  • @digitallworld
    @digitallworld 11 месяцев назад

    very helpful . thanks.

  • @ozzyoz6824
    @ozzyoz6824 11 месяцев назад

    Thank you!

  • @caseygarrison6733
    @caseygarrison6733 11 месяцев назад

    'Promo sm'

  • @edithpuclla6188
    @edithpuclla6188 Год назад

    Bravooo!!