5.2 - Airflow-Dataproc Integration | Apache Spark on Dataproc | Google Cloud Series

Поделиться
HTML-код
  • Опубликовано: 4 дек 2024

Комментарии • 24

  • @soumyasourabh8799
    @soumyasourabh8799 Год назад

    Huge props Sushil. The way you simplified the concepts, things just got slotted into my head. Great work, really appreciate it!

  • @sahillohiya7658
    @sahillohiya7658 Год назад

    you are so underrated

  • @swapnamohankumar
    @swapnamohankumar Год назад

    @ Sushil Kumar, Thank you so much for this wonderful playlist on GCP... it will really help the beginners... Good explanation and end-to-end data pipeline flow is covered.

  • @limeraghu579
    @limeraghu579 3 года назад

    Very good , you have god given talent use it well

  • @chetanhirapara7082
    @chetanhirapara7082 2 года назад

    Thank you so much Sushil kumar. You are doing awesome job for learner.
    It would be great and complete playlist if you will add installation of airflow on GCP

  • @VaishnavKolte-l3c
    @VaishnavKolte-l3c Год назад

    You are doing a great job. These videos are very helpful. I had one doubt in this video, how is Airflow able to access dataproc and create cluster and fire spark jobs to it. Don't we need to add some kind of permission in dataproc to allow airflow access dataproc and also some configurations in Airflow?

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk Год назад

    Nice video,
    But,how we can create dataproc cluster with gcs storage and sql metastore using this airflow operators?
    What if I need to read data from a gcs bucket and write to another bucket?

  • @Kondaranjith3
    @Kondaranjith3 Год назад

    air flow basics needed sir

  • @ExploreWithArghya
    @ExploreWithArghya Год назад

    Hi sushil, I am following you videos of GCP . I have two doubts. First how do I set the job_id of dataproc jobs via airflow and second which is very important - how do I add 'additional python files' in dataproc job via airflow?

  • @sundarrajkumaresan8045
    @sundarrajkumaresan8045 2 года назад

    create.a playlist for all the services and different DataFlow templates.
    Once again Thanks for the useful content!!!

  • @prachiagarwal9457
    @prachiagarwal9457 2 года назад

    How can I mark the Airflow task as failed when the spark job fails? The operator used to submit the spark job is DataprocSubmitJobOperator

  • @aldoescobar3973
    @aldoescobar3973 2 года назад

    did u try spark serverless?

  • @loke261989
    @loke261989 2 года назад

    What permissions are needed for airflow to manage dataproc assets, can u pls explain

  • @AdityaAlkhaniya_adi
    @AdityaAlkhaniya_adi 3 года назад +1

    If possible can you please make a demo detailed video on cloud composer

    • @kaysush
      @kaysush  3 года назад +2

      Sure. I’ll add that as a separate video and post the link here. Thanks.

    • @kaysush
      @kaysush  3 года назад

      Hey Aditya, I’ve added the video on Composer. Please have a look and let me know if you have any feedback. ruclips.net/video/g6Fmrmh8C20/видео.html

    • @etgcrog1
      @etgcrog1 2 года назад

      @@kaysush thanks

  • @etgcrog1
    @etgcrog1 2 года назад

    How i can create the dag ?

    • @kaysush
      @kaysush  2 года назад

      The DAG is a python file. You put it on $AIRFLOW_HOMW/dags folder.
      Depending on how your Airflow instance is configured, it could either be a bucket (if you are using Cloud Composer) or a folder on filesystem.
      Watch my video on Cloud Composer to know more.
      Thanks

  • @vikaskatiyar1120
    @vikaskatiyar1120 3 года назад

    can you please share the github link for this example ?

    • @kaysush
      @kaysush  3 года назад +1

      Sample DAG : gist.github.com/kaysush/ade06ca3b4f42218f720e92e455c7b7b
      PySpark Code : gist.github.com/kaysush/65fdd9a5d5bb03a198d8fb1e23125bf1

    • @kalyandowlagar3901
      @kalyandowlagar3901 2 года назад

      @@kaysush Hi, In this tutorial is the airflow server outside the GCP if so how is the connection established when you switch from airflow to dataproc and from dataproc to vscode how are the connections established i know vscode can connect to remote host using SSH but how is dataproc and airflow standalone server connection is established

  • @etgcrog1
    @etgcrog1 2 года назад

    where i put the archive? in bucket?