Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 1

Поделиться
HTML-код
  • Опубликовано: 17 окт 2024
  • This is the part 1 of this Zillow data analytics end-to-end data engineering project.
    In this data engineering project, we will learn how to build and automate a python ETL process that would extract real estate properties data from Zillow Rapid API, loads it unto amazon s3 bucket which then triggers a series of lambda functions which then ultimately transforms the data, converts into a csv file format and load the data into another S3 bucket using Apache Airflow. Apache airflow will utilize an S3KeySensor operator to monitor if the transformed data has been uploaded into the aws S3 bucket before attempting to load the data into an amazon redshift.
    After the data is loaded into aws redshift, then we will connect amazon quicksight to the redshift cluster to then visualize the Zillow (rapid data) data.
    Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines. This project will entirely be carried out on AWS cloud platform.
    In this video I will show you how to install Apache airflow from scratch and schedule your ETL pipeline. I will also show you how to use sensor in your ETL pipeline. In addition, I will show you how to setup aws lambda function from scratch, set up aws redshift and aws quicksight.
    As this is a hands-on project, I highly encourage you to first watch the video in its entirety without typing along so that you can better understand the concepts and the workflows after which you should either try to replicate the example I showed without watching the video but consult the video when you are stuck or you could watch the video again the second time in its entirety while also typing along this time.
    Remember the best way to learn is by doing it yourself - Get your hands dirty!
    If you have any questions or comments, please leave them in the comment section below.
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    **************** Commands used in this video ****************
    sudo apt update
    sudo apt install python3-pip
    sudo apt install python3.10-venv
    python3 -m venv endtoendyoutube_venv
    source endtoendyoutube_venv/bin/activate
    pip install --upgrade awscli
    sudo pip install apache-airflow
    airflow standalone
    pip install apache-airflow-providers-amazon
    *Books I recommend*
    1. Grit: The Power of Passion and Perseverance amzn.to/3EZKSgb
    2. Think and Grow Rich!: The Original Version, Restored and Revised: amzn.to/3Q2K68s
    3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: amzn.to/3LLpXRy
    4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: amzn.to/48RbuOb
    5. Introducing Python: Modern Computing in Simple Packages amzn.to/3Q4driR
    6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: amzn.to/3rGF73G
    **************** USEFUL LINKS ****************
    How to remotely SSH (connect) Visual Studio Code to AWS EC2: • How to remotely SSH (c...
    Extract current weather data from Open Weather Map API using python on AWS EC2: • Extract current weathe...
    How to send out email alert ON RETRY and ON FAILURE in Apache airflow | Airflow Tutorial • How to send out email ...
    Monitor workflow with slack alert upon DAG failure | Airflow Tutorial • Monitor workflow with ...
    How to build and automate a python ETL pipeline and slack alert with airflow | Airflow Tutorial • How to build and autom...
    PostgreSQL Playlist: • Tutorial 1 - What is D...
    Rapid API: rapidapi.com/hub
    AWS Lambda function - Create your first Lambda Function | Lambda Function Tutorial for beginners • AWS Lambda function - ...
    Github Repo: github.com/Yem...
    airflow.apache...
    airflow.apache...
    airflow.apache...
    airflow.apache...
    Part 2: • Zillow Data Analytics ...
    Part 3: • Zillow Data Analytics ...
    Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.
    DISCLAIMER: This video and description has affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
    #dataengineering #airflow

Комментарии • 84

  • @i_am_out_of_office_
    @i_am_out_of_office_ 8 месяцев назад +8

    Awesome Tutorial
    Your dedication to teaching end-to-end data engineering pipelines is truly inspiring. Your guidance has not only deepened my understanding of complex concepts but also empowered me to navigate the intricacies of building robust data pipelines. Thank you for your unwavering support and commitment to fostering knowledge in this dynamic field. Love from India 🚩

    • @krishnakumarkumar5710
      @krishnakumarkumar5710 8 месяцев назад

      Being out of the office you learn all??

    • @tuplespectra
      @tuplespectra  8 месяцев назад

      Thank you so much for your comment. It really means a lot to me.

  • @seth_king_codes
    @seth_king_codes 11 месяцев назад +3

    BEST channel on youtube for learning about data engineering...thank you man
    your content inspires me

    • @tuplespectra
      @tuplespectra  10 месяцев назад

      Thanks so much for this comment. It really means a lot to me

  • @GeAsita
    @GeAsita Месяц назад

    Loves from Argentina mate!! Amazing tutorial! Not only you clearly show having worked with these tools (not like most "gurus'). Plus you also explain the theory at the start, this is just golden content and God bless you for making it!

  • @nicholasmageto6110
    @nicholasmageto6110 4 месяца назад

    The best ETL video I have ever come across. Thank you sir ❤‍🔥❤‍🔥❤‍🔥💯💯

  • @tuananhdo6006
    @tuananhdo6006 4 месяца назад

    This is just what I have been searching for, thank you good sir, please kindly post more videos, you are awesome

  • @gyungyoonpark
    @gyungyoonpark 8 месяцев назад +1

    thank you for the awesome tutorial!!! can't wait to start part 2.
    just one correction. in the "commands used", please add "sudo apt install awscli" as well.

  • @R_SinghRajput
    @R_SinghRajput 5 месяцев назад

    Since I’m a mech engineer coding is almost like mandarin to me but u sir the Great explanation 🙏🏻🔥🫡🫡 really loved it n totally understood ❤❤

  • @bodzio7843
    @bodzio7843 Месяц назад

    You are great mate. Such amount of knowledge

  • @dudee420
    @dudee420 3 месяца назад +1

    Bro, your explanation is really amazing. Nobody explain at that level. if possible can you start some videos on GCP cloud data engineering projects also. Thank you for great learning

    • @avinash390
      @avinash390 3 месяца назад

      Hey bro .... Did you complete this project on AWS , how much was the total cost or it was within the free tier limit

  • @nameisnani5573
    @nameisnani5573 10 месяцев назад

    Awesome Brother, This is the Best channel i have ever seen in youtube to learn something real. Great work, Nobody can explain like you did, Thank you soo much, Lots of love for you. Keep doing this. Thanks a lott again.

    • @tuplespectra
      @tuplespectra  10 месяцев назад

      Thanks so much for your comment. I really appreciate it, and it means a lot to me and motivates me to do more.

  • @zuesbenz
    @zuesbenz 6 месяцев назад

    another good video from you. keep it going, keep it simple to the point and let it flow together end to end. just as you have been doing.

    • @tuplespectra
      @tuplespectra  6 месяцев назад

      Thanks so much for your comment.

  • @pareekshitgaddam9912
    @pareekshitgaddam9912 3 месяца назад

    Amazing content! Thank you brother. Please do upload more such videos!!

  • @shivanshhedaoo7268
    @shivanshhedaoo7268 11 месяцев назад +1

    Hi after airflow standalone i am getting error:
    ModuleNotFoundError: No module named 'connexion.decorators.validation'
    How do I fix this?

  • @shumengshi5925
    @shumengshi5925 6 месяцев назад

    Thank you for the wonderful tutorial! It's been incredibly helpful, and I've already subscribed to your RUclips channel!
    I have a question about the necessity of using EC2 in this project. Would it be possible to achieve the same results by simply installing Apache Airflow locally within a Python virtual environment? I followed your steps closely, but when I run a DAG with tasks to extract Zillow data via the Rapid API, the DAG seems to get stuck in the running state indefinitely without completing, and it doesn't generate any logs.
    Interestingly, when I test the Rapid API locally in a plain Python file, it works perfectly fine. Additionally, when I create a DAG without making requests to the API, it also works without any issues. The problem only arises when the DAG task attempts to access Zillow data via the Rapid API.
    I'm curious if this is why EC2 is used in the project. Any insights you could provide would be greatly appreciated! Thanks again for putting out great Data Engineering content!!

  • @sophialawal7306
    @sophialawal7306 4 месяца назад

    which app did you use to create the data pipeline visualization?

  • @kandoras.guzman6705
    @kandoras.guzman6705 11 месяцев назад

    This was awesome! Thank you for this resource.

  • @srinivasrepala1
    @srinivasrepala1 Месяц назад

    ❤ good content

  • @rajkumardubey5486
    @rajkumardubey5486 2 месяца назад

    We can also use the .env file for encryption of api key and use envloader

  • @assieneolivier5560
    @assieneolivier5560 9 месяцев назад

    Great and explicative video guys!! Amazing!!!

    • @tuplespectra
      @tuplespectra  9 месяцев назад

      Thanks so much! I'm glad you like it.

  • @akj3344
    @akj3344 Год назад

    At 19 seconds, already liked and subscribed.

    • @tuplespectra
      @tuplespectra  Год назад

      Awesome. Thanks so much. And thanks for finding our video valuable.

  • @HarrisKeith-r5x
    @HarrisKeith-r5x 11 месяцев назад

    Hey, let say I do this end to end mapping myself how much will it cost me to use their services? Can I do this in free tier plus additional cost I may incur using ec2 instance that is not free like you mentioned?

  • @tolu_datacation
    @tolu_datacation Год назад +2

    Very explanatory!

  • @Friendsforever-rg2bq
    @Friendsforever-rg2bq 2 месяца назад +1

    Amazing man..!

  • @joshuaroberts3987
    @joshuaroberts3987 11 месяцев назад +1

    My ip address refuses to connect after i established port 8080. It showed airflow login and i put in credentials then show a refused to connect screen

  • @himanshupatil6661
    @himanshupatil6661 7 месяцев назад

    I am getting an error while executing apache standalone
    TypeError: SqlAlchemySessionInterface.__init__() missing 6 required positional arguments: 'sequence', 'schema', 'bind_key', 'use_signer', 'permanent', and 'sid_length'

  • @QuanNguyen-z2g
    @QuanNguyen-z2g 11 месяцев назад

    i just wonder this data pipeline using Lambda for loading and transforming data instead of Glue spark jobs?

  • @AlDamara-x8j
    @AlDamara-x8j Год назад

    Thanks for this great tutorial! Questions: Is it possible to use Cloud 9 as our IDE and from there access our EC2, or viceversa?

    • @tuplespectra
      @tuplespectra  Год назад

      I believe you should be able to use it. Although I have not used it for my airflow project before. You will have to provision a cloud9 IDE and use it but you will have to pay for it except if there is a free-tier that you can use.

  • @nayanroy13
    @nayanroy13 Год назад +1

    Your content is very useful!

    • @tuplespectra
      @tuplespectra  Год назад

      Thanks so much. Your comment means a lot to us and I'm glad that you find our contents useful and valuable.

  • @sibisuriyanarayantiruchira2302
    @sibisuriyanarayantiruchira2302 Год назад +1

    Very helpful! Thank you so much :)

    • @tuplespectra
      @tuplespectra  Год назад

      Thank you so much. I'm glad you find it helpful.

  • @pranalidarekar_5852
    @pranalidarekar_5852 11 месяцев назад

    Thanks for the tutorial, I am trying to connect VSC with the same EC2 instance we created in this project but it showing that permission is denied due to public key. I followed each steps from your other video which is 'How to remotely SSH (connect) Visual Studio Code to AWS EC2'. Please help me with this. I have tired everything but showing me same issue. Iam using Macbook. Thankyou for your time!

    • @tuplespectra
      @tuplespectra  11 месяцев назад

      May be you need to grant permission to the .pem file such as writing "chmod 400 path/to/filename". Another issue might also be the syntax in your config file. You need to make sure you write it the way it should with lower case where it supposed to be etc.

  • @maxubani9219
    @maxubani9219 9 месяцев назад

    GOD BLESS YOU!❤

  • @navaneethur5466
    @navaneethur5466 7 месяцев назад

    Hi sir,airflow option is not visibile in the vs code interface even after installing it in the ubuntu instance

    • @nikhitabiradar2146
      @nikhitabiradar2146 2 месяца назад

      Hi, I'm facing the same issue. Were you able to resolve it?

  • @darshan9340
    @darshan9340 11 месяцев назад +1

    Hi,
    The project is really good, got to learn so much.
    I have an error while I am trying to transfer my file from ec2 to s3 bucket.
    File "/usr/local/lib/python3.10/dist-packages/airflow/operators/bash.py", line 210, in execute
    raise AirflowException(
    airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code 127.
    I have checked my bash code, it is perfectly fine. My first dag, python operator is running and creating the file but when it comes to bash operator task, it is failing.

    • @pranalidarekar_5852
      @pranalidarekar_5852 11 месяцев назад

      this happened with me too..what is the solution?

    • @darshan9340
      @darshan9340 11 месяцев назад

      ​@@pranalidarekar_5852 I had a spelling mistake in my code, that's the reason why it was not running.

    • @pranalidarekar_5852
      @pranalidarekar_5852 11 месяцев назад

      where exactly did you make the mistake
      @@darshan9340

    • @gyungyoonpark
      @gyungyoonpark 8 месяцев назад

      @@darshan9340 I have the same error. can you tell me where you were wrong?

  • @omkarmore2198
    @omkarmore2198 Год назад

    Excellent ...

  • @salmanshikalgar4482
    @salmanshikalgar4482 2 месяца назад

    Pip install --upgrade awscli command not running in virtual environment

  • @lesa7p2lmansion
    @lesa7p2lmansion Год назад

    guys anybody can help with timestamps on the videos ? it will be really helpful
    I am doing the project and putting it on github and linkedin when I finish...Thanks

  • @inadaldaldaldal8231
    @inadaldaldaldal8231 Месяц назад

    can you Azure platform

  • @abduljaweed8131
    @abduljaweed8131 Год назад +2

    Make one ETL project with Apache airflow without using any cloud

    • @akj3344
      @akj3344 Год назад +3

      Why through? In your job, youll be expected to work with cloud.

    • @abduljaweed8131
      @abduljaweed8131 Год назад +1

      @@akj3344 yes but cloud is expensive so to understand the technology doing on local machine I think its good then after know the tech doing experimentation with cloud

    • @Edbwalz
      @Edbwalz Год назад

      ​​@@abduljaweed8131Let me take you through an overview of a project that you can do without using cloud:
      First start by working with a CSV file. What you do is upload that file to an s3 bucket, and then load the data from the s3 bucket and basically transform it to parquet data type, and then write to another s3 bucket. After that you can use airflow to orchestrate the tasks.
      Now instead of using s3, you can use minio. It's an open source tool that works exactly like s3. Infact, the airflow operators for s3 can be used on minio as well.
      You can use pandas dataframe to do the transformation to parquet and write the file to the minio bucket. If you want to get a bit more fancy, you can use spark to do the same thing(it leverages the use of dataframes)
      After working with a file, then you easily change the data source to api endpoint.
      I can help if you want. Just ask if you need more clarification. I just gave an overview basically.

    • @cOnfidentialcOrp
      @cOnfidentialcOrp 11 месяцев назад

      @@abduljaweed8131
      Main reason why cloud is used among big companies because its cheap vs building your own data center
      Also , aws and azure have free tier plans , enough for you to learn aswell

  • @amanpirjade9
    @amanpirjade9 Год назад

    Make video on AWS data analytics services project

  • @Mehtre108
    @Mehtre108 8 месяцев назад

    Domain name pls

  • @kanchandendge1517
    @kanchandendge1517 Год назад +1

    Airflow Standalone command getting stuck. not creating user and password . @tuplespectra, could you please help

    • @tuplespectra
      @tuplespectra  Год назад

      Can you kill the server(CTR + C) and then restart it?

    • @kartikeymishra2673
      @kartikeymishra2673 Год назад

      hey were you able to fix this error?
      I also faced the same issue !

    • @kartikeymishra2673
      @kartikeymishra2673 Год назад

      @@tuplespectra well this really helped , thanks :)

    • @tuplespectra
      @tuplespectra  11 месяцев назад

      @@kartikeymishra2673 You are welcome.

    • @nikkim94nikhil
      @nikkim94nikhil 9 месяцев назад

      @@tuplespectra Hey, i'm getting a typeerror and not getting stuck but not creating user and password either! Can you help please

  • @Nari_Nizar
    @Nari_Nizar Год назад

    Thank you so much for such and awesone tutorial. I wanted to run these codes and I am getting this error:
    WARNING - Error when trying to pre-import module 'airflow.providers.amazon.aws.sensors.s3' found in /home/ubuntu/airflow/dags/zillowanalytics.py: No module named 'airflow.providers.amazon'
    Please help!

    • @Nari_Nizar
      @Nari_Nizar Год назад

      @tuplespectra could you please help?

    • @tuplespectra
      @tuplespectra  Год назад

      @@Nari_Nizar did you remember to do a "pip install apache-airflow-providers-amazon"?

    • @Nari_Nizar
      @Nari_Nizar Год назад

      @@tuplespectra it worked! Thank you very much, this is an excellent project!

    • @tuplespectra
      @tuplespectra  Год назад

      @@Nari_Nizar Thanks. I'm glad it worked and you found the project valuable. Please help Like our videos and Share with your friends, team mates, colleagues so more people can benefit. Thanks so much.