Build your first pipeline DAG | Apache airflow for beginners

Поделиться
HTML-код
  • Опубликовано: 26 июл 2024
  • #apacheairflow #airflowforbeginners #datapipeline #etl #etlpipeline #airflow2
    Welcome to series Apache airflow for beginners. In this video, you'll learn how to build your first pipeline DAG in Apache Airflow. As a Data Engineer, you will be tasked to write a data aggregation and cleaning pipeline. We are going to build an ETL pipeline as part of Airflow DAG with a real-world example of Hotel Reservation Records.
    Read at maxcotec.com/learning/apache-...
    ---------------------------------------------------
    Get complete code from here;
    github.com/maxcotec/Apache-Ai...
    Watch next video
    Airflow Macros
    • Use Apache Airflow Mac...
    Watch the previous video here:
    Run Airflow locally via Docker
    • Run Airflow 2.0 via Do...
    --------------------------------------------------------
    Timeline
    00:00 intro
    00:13 definition
    00:40 DAG building blocks
    01:39 Example - ingestion pipeline overview
    03:05 Lets Code DAG
    07:53 Lets deploy DAG
    09:29 Examine DAG output
    10:25 Ingest more data
    11:17 Outro
    --------------------------------------------------------
    Learn Something New Below:
    maxcotec.com/blog/is-a-stunde...
    maxcotec.com/blog/pv-center-f...
    👍 If this video was helpful to you, please don't forget to thumbs-up,
    ⌨️ leave a comment and
    🔗 share with your friends.
    ⭐ Support our work by hitting that subscribe button.
    🔔 hit bell icon to stay alert on more upcoming interesting stuff.
    Stay tuned !

Комментарии • 32

  • @MrMegabeat
    @MrMegabeat Год назад +2

    My best 10mins investment in the morning! 🎉

  • @demohub
    @demohub Год назад +3

    This video has definitely given a better understanding of Airflow and now have some ideas on how to use it more effectively for projects.

  • @muhammedalbayati
    @muhammedalbayati 2 года назад +1

    Thanks a lot. Very good tutorials. Thumbs-up and subscribe

  • @downeytang7006
    @downeytang7006 2 года назад +2

    a quick question, if you have two different date format from two csv files, and after performing concat, is there a way to unify the date format, for example '2021-01-13' and '01-20-2022'

    • @maxcoteclearning
      @maxcoteclearning  2 года назад

      Thanks for watching. have a look at dateutil.parser. Checkout more answers here stackoverflow.com/a/40800072/5167801

  • @najmuddin7506
    @najmuddin7506 Год назад +1

    Thanks for the tutorial! However, could explain what the difference is between running this workflow as an airflow DAG and simply running the program and calling both functions sequentially?

    • @maxcoteclearning
      @maxcoteclearning  Год назад

      Speaking about this single and simple workflow, its really easy to manage, so you may not need airflow. But it becomes really hard to manage and maintain if you have 100+ complex workflows. Watch the first video of this series, where I am explaining what problem airflow solves (ruclips.net/video/56GDKurqhCo/видео.html).

  • @parikshitchavan2211
    @parikshitchavan2211 Год назад

    Hello Thanks for such a great tutorial everting you made smooth like butter ,just one question whenever we made new DAG we will have to add (docker-compose-CeleryExecutor, docker-compose-LocalExecutor, and Config for that particular DAG )??

    • @maxcoteclearning
      @maxcoteclearning  Год назад

      Thanks Parikshit. Only one executor can be used at a time. You can add multiple dags, while keeping single executor with same config file.

  • @victoriwuoha3081
    @victoriwuoha3081 2 года назад +1

    @MaxcoTec Please do you have any resource on how I can read data from an API and perform some similar processing & finally write to a destination SQL server. I'll be grateful if you could advise.

    • @maxcoteclearning
      @maxcoteclearning  2 года назад

      You can use most popular python library requests (docs.python-requests.org/en/latest/) to fetch data from any API. They have good examples under Quickstart section. hope that helps :)

    • @victoriwuoha3081
      @victoriwuoha3081 2 года назад

      @@maxcoteclearning Thank You, I'll try that out

  • @nghianguyen9439
    @nghianguyen9439 2 года назад +1

    Thank for very good videos. Can you help me to give some instructions about an example data pipeline in Mongodb?

    • @maxcoteclearning
      @maxcoteclearning  2 года назад

      You welcome. Sure, can you explain more about your pipeline. Whats the data flow (source/destination). are you persisting data into mondoDB ? or extracting out of it ? Have you looked at this
      github.com/airflow-plugins/mongo_plugin/blob/master/operators/s3_to_mongo_operator.py

    • @nghianguyen9439
      @nghianguyen9439 2 года назад

      @@maxcoteclearning I am trying to do a data sync between 2 separate MongDB or simply read a csv file then import to MongoDB

  • @FedericoLov
    @FedericoLov Год назад

    good video but it seems that the actual transformations are done in pandas while airflow only provides a layer of logging and task scheduling

    • @maxcoteclearning
      @maxcoteclearning  Год назад +1

      Thats true. Airflow is a workflow management tool. I've just used a simple ETL operation to show how it can be deployed and managed using Airflow.

  • @ammadkhan4687
    @ammadkhan4687 3 месяца назад

    how can I access airflow container when I am hosting to another server and add more dags?

    • @maxcoteclearning
      @maxcoteclearning  3 месяца назад

      Hi Ammad, could you explain what does 'when I am hosting to another server ' means?

    • @ammadkhan4687
      @ammadkhan4687 3 месяца назад +1

      @@maxcoteclearning suppose I have a docker hosting server. I am connecting to this server remotely. how can we as a team create more dags to work on this hosted server? for example hosting docker container of airflow in azure cloud or on premise docker hosting server.

  • @muhammedalbayati
    @muhammedalbayati 2 года назад

    Please how can save these CSV data to MS Sql server database?

    • @maxcoteclearning
      @maxcoteclearning  2 года назад +1

      It will be similar just the way we are loading data to sqlite database. pandas_df.to_sql("table_name", engine)

    • @muhammedalbayati
      @muhammedalbayati 2 года назад

      @@maxcoteclearning Thanks

  • @riyasingh2515
    @riyasingh2515 2 года назад

    my task are getting failed in airflow UI can u tell why it is happening, so I copied all your code properly

    • @maxcoteclearning
      @maxcoteclearning  2 года назад

      Hi Riya, May I know what exact errors are you getting ?

    • @diptimanraichaudhuri6477
      @diptimanraichaudhuri6477 2 года назад +1

      I was also getting a DAG failed initially from the Github code sample, turns out there is a variable "file_date_path" in transform_data method, which gets constructed from the op_args passed to the DAG. So, unless, the file is kept in the same folder hierarchy, the booking read will fail. So, please keep your booking .csv in the following hierarchy "raw_data//" and it will start working. O/wise, you can modify the code where it reads from that folder and just plainly read and write w/o dates in folder names.
      It is rare to find such a well-laid out series. Kudos Maxco Tec ! !

  • @orick92
    @orick92 Год назад +4

    You should delete "beginners" title from the headline...

  • @TheVickramsharma
    @TheVickramsharma Год назад +1

    Hi @MaxcoTec, i tried running this example and am getting this error: FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/raw_data/2022-12-13/5/booking.csv', could you please help me in this

    • @maxcoteclearning
      @maxcoteclearning  Год назад

      Are you surely running code from rite branch ? This video code is not in main branch. Check this commit github.com/maxcotec/Apache-Airflow/tree/1787097721a8cec8999bdaee4c04a9f4bc0e1f71/DAG_ingestion_pipeline.

    • @prafulsoni9378
      @prafulsoni9378 Год назад

      @@maxcoteclearning I'm also facing the same issue, I clone the branch and at the `DAG_ingestion_pipeline` I run `docker-compose up`

    • @prafulsoni9378
      @prafulsoni9378 Год назад

      @@maxcoteclearning I am using Windows!