Part 5 - Connect VS code and create task 1 | Airflow Tutorial | Automate EMR Jobs with Airflow

Поделиться
HTML-код
  • Опубликовано: 20 ноя 2023
  • #dataengineering #emr #airflow #spark #pyspark #aws #etlpipeline #redfin
    In this video, I explained how to use airflow to automate EMR jobs. I showed you how to create EMR cluster, poll the states of the EMR, add EMR steps and terminate the EMR cluster. Airflow was used to orchestrate the entire ETL pipeline.
    The EMR steps involved extracting redfin data from the redfin datacenter web address and then performed a transformation step on the data. Both the raw data and the transformed data were loaded unto an S3 bucket.
    Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines.
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    *Books I recommend*
    1. Grit: The Power of Passion and Perseverance amzn.to/3EZKSgb
    2. Think and Grow Rich!: The Original Version, Restored and Revised: amzn.to/3Q2K68s
    3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: amzn.to/3LLpXRy
    4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: amzn.to/48RbuOb
    5. Introducing Python: Modern Computing in Simple Packages amzn.to/3Q4driR
    6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: amzn.to/3rGF73G
    **************** Commands used in this video ****************
    Check out my github Repo
    github.com/YemiOla/data_engin...
    **************** USEFUL LINKS ****************
    1. Redfin Analytics|python ETL pipeline with airflow|Data Engineering Project|Snowpipe|Snowflake|Part 1 • Redfin Analytics|pytho...
    2. What is AWS EMR | Extract and Transform Redfin data with AWS EMR | EMR Studio | Pyspark Notebook • What is AWS EMR | Extr...
    3. How to remotely SSH (connect) Visual Studio Code to AWS EC2 • How to remotely SSH (c...
    4. Apache Airflow Playlist • How to build and autom...
    5. www.redfin.com/news/data-center/
    6. airflow.apache.org/docs/apach...
    7. registry.astronomer.io/provid...
    8. docs.aws.amazon.com/emr/lates...
    9. registry.astronomer.io/provid...
    10. registry.astronomer.io/provid...
    11. PostgreSQL Playlist: • Tutorial 1 - What is D...
    DISCLAIMER: This video and description have affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
    #dataengineering #emr #airflow #spark #pyspark #aws #etlpipeline #redfin
  • НаукаНаука

Комментарии •