Cloud Data Engineering Project : Migrating an On-Premises Data Pipeline to AWS

Поделиться
HTML-код
  • Опубликовано: 19 авг 2024
  • In this video, you will see:
    - Introduction: Overview of the project and the motivation behind migrating from an on-premises setup to AWS.
    - On-Premises Pipeline Overview: Understanding the initial pipeline using HDFS, Spark, Kafka, PostgreSQL, and Power BI... ( details here : [ • Big Data engineering P... ])
    - AWS Pipeline Architecture:
    - Data Collection: Scraping data from Jumia using BeautifulSoup, transforming it with Python and pandas, and storing it in Amazon S3.
    - Data Processing and Cataloging: Using AWS Glue Crawler to automatically catalog the data stored in S3.
    - Data Analysis: Running SQL queries on the data using Amazon Athena.
    - Data Visualization: Creating insightful visualizations with Amazon QuickSight.
    - Results Storage: Storing SQL query results in an S3 bucket.
    Detailed Steps Covered:
    1. Setting up S3 buckets for raw and processed data.
    2. Implementing web scraping with BeautifulSoup.
    3. Configuring AWS Glue Crawler and Data Catalog.
    4. Running data analysis queries with Amazon Athena.
    5. Visualizing data with Amazon QuickSight.
    #AWS #DataEngineering #CloudComputing #DataPipeline #BigData #WebScraping #AmazonS3 #AWSGlue #AmazonAthena #AmazonQuickSight #Python #BeautifulSoup

Комментарии • 1

  • @aymanemaghouti2065
    @aymanemaghouti2065  Месяц назад +1

    To see my latest project refer to : [ ruclips.net/video/MDplJJlo-y8/видео.htmlsi=yTQbgnnUXjEv7FEH ]