Real time - Streaming Data from PubSub to BigQuery Using Dataflow in GCP

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024
  • Wanted to learn more about building big data pipelines in cloud, AI & ML in Multi cloud platforms.
    Please share, support and subscribe to Channel: lnkd.in/ehFZbVH5
    If you want to prepare and ready for Google Cloud Professional Data Engineer Cloud Certification I have published my course "GCP Professional Data Engineer Certification-A Complete Guide” is live on Udemy.
    Go to this link.
    lnkd.in/gdy7cVmb
    GitHub URL: github.com/vig...
    You have been asked to assist the team with streaming temperature data into BigQuery using Pub/Sub and Dataflow; you receive the following request to complete the following tasks:
    1. Create a Cloud Storage bucket as the temporary location for a Dataflow job.
    2. Create a BigQuery dataset and table to receive the streaming data.
    3. Create up a Pub/Sub topic and test publishing messages to the topic.
    4. Create and run a Dataflow job to stream data from a Pub/Sub topic to BigQuery.
    5. Run a query to validate streaming data.
    Some standards you should follow:
    * Ensure that any needed APIs (such as Dataflow, PubSub, BigQuery, Storage) are successfully enabled.
    #gcp #cloud #cloudcomputing #dataengineering #gcpcloud #pubsub #Storage #bigquery #bigdata #dataflow #cloud #airflow #python #sql #cloud

Комментарии • 7

  • @user-zn5tn9br3b
    @user-zn5tn9br3b 10 месяцев назад +1

    Hey thx for teaching, good explaining, I want to ask a stupid question > <
    Why not send data directly to Bigquery ?? ( only 1 step )
    Send to PubSub => Dataflow => Bigquery ( 3 steps .... ) Thx !!!

    • @cloudaianalytics6242
      @cloudaianalytics6242  10 месяцев назад +3

      This is a valid question for sure. A use case can be implemented in different ways but as a professional we always tend to provide an efficient and optimized solution.
      1. Why not send data directly to Big query ?? ( only 1 step )
      Ans: To answer, If I do this then its not a streaming service its just a batch processing or a migration for that I can use BQ data transfer, CS transfer. But for implementing any streaming use case we need a streaming service like Pub Sub or relevant third party service in GCP.
      2. Send to Pub Sub => Dataflow => Big query
      Ans: I can answer this in two ways
      a. I can use Pub sub with subscription and Big query, where I can stream data from topic and push it to BQ table through subscription. This is one way of doing it.
      (or)
      b. I can use the same as shown in video, where my Pub Sub topic doesn't have subscription, It just publishes the message to BQ from topic via Dataflow's predefined template.
      My objective of this video is to show how to use Predefined templates provided by GCP in Cloud Dataflow to Stream pipeline from Pub Sub topic to BQ.
      The same use case I can implement it using other services like Dataproc, Data fusion or a simple python script.
      I hope it makes sense now. Please let me know for any questions😀

  • @ainvondegraff5233
    @ainvondegraff5233 5 месяцев назад

    Awsome explanation really wanted to know this, If I migrate Control-M Workload automation tool to GCP. How will I connect control-m to pub/sub?

  • @riyanshigupta950
    @riyanshigupta950 4 месяца назад

    Amazing content! Thanks

  • @zzzmd11
    @zzzmd11 5 месяцев назад

    Hi, Thanks for the great informative video. can you explain the flow if the data source is from a Rest API. Can we have a dataflow configured to extract from a Rest API to big query with dataflow without having cloud functions or Apache beam scripts involved? Thanks a lot in advance..

  • @ushasribhogaraju8895
    @ushasribhogaraju8895 5 месяцев назад

    Thanks for your videos, I find them helpful. I could get the message published by a python script to pub/sub, updated to the data column in a big query table, by simply creating a subscription that writes to Big Query (to the same topic) without using Dataflow. Since pub sub is schema less, it is receiving whatever schema is published by the python script. My question is , is there a way to update a big query table using the same schema received in pub/sub?

  • @Rajdeep6452
    @Rajdeep6452 5 месяцев назад

    Hey bro. Thanks for the video. I have a ETL process running on VM, using docker and Kafka. And the data is getting stored in big query, as soon as I run the producer and consumer manually. I wanted to use cloud compose to automate this (like whenever I login to my VM the etl process starts automatically), but I couldn’t. Can you tell me if it’s possible to do this with dataflow? I am having trouble setting it up.