Run Apache Spark jobs on serverless Dataproc

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024
  • Today, I'm excited to share a hands-on example of using a custom container to bundle all Spark job dependencies and execute it on serverless Dataproc. This powerful feature provides a streamlined approach for running Spark jobs without managing any infrastructure, while still offering advanced features like fine-tuning autoscaling-all without incurring the cost of a constantly running cluster. #ApacheSpark #GoogleCloud #Serverless #Dataproc #BigData
    00:17 - Table of Contents
    01:19 - What is Dataproc?
    01:53 - Dataproc vs serverless Dataproc
    03:52 - Custom containers on Dataproc
    08:14 - A real-world use case
    11:33 - Code walk through
    20:43 - See it in Action!
    25:55 - Summary
    Useful links
    - code: github.com/roc...
    - slides: docs.google.co...
    - custom container: cloud.google.c...
    - serverless vs compute engine: cloud.google.c...
    - spark submit via REST: cloud.google.c...
    - service to service communication: cloud.google.c...

Комментарии • 13

  • @Rising_Ballers
    @Rising_Ballers 2 месяца назад +2

    Hi Richard, Love your content, always wanted someone to do GCP training videos emphasizing real world use cases, I work in Bigquery and Composer, I wanted to learn dataproc and dataflow. But everywhere i see same type of trainings not much focusing on real world implementations. I wanted to learn how dataproc and dataflow jobs are deployed in different environments like dev test and prod, your videos are helping a lot, hope you will do more videos on dataflow and dataproc, how we use this in real projects in how we create these jobs using CICD

    • @practicalgcp2780
      @practicalgcp2780  2 месяца назад +1

      No worries glad you found this useful ❤

    • @Rising_Ballers
      @Rising_Ballers 2 месяца назад

      @@practicalgcp2780 I have one doubt, in an organization if we have many dataproc jobs how will we create it in different environments like dev test and prod, can you please do a video on that

  • @paaabl0.
    @paaabl0. 9 месяцев назад +2

    Awesome presentation! Far better than so much other, mostly self-promo, content

    • @practicalgcp2780
      @practicalgcp2780  9 месяцев назад +1

      Thanks so much ❤ glad you found it useful. The goal of this channel is to showcase ideas that can actually work well to solve real world problems.

  • @RobertWestDavid
    @RobertWestDavid Год назад +2

    thank you. This is really clear and well articulated. Hard to find on youtube Data Eng stuff

  • @user-op8ez7tb9l
    @user-op8ez7tb9l 6 месяцев назад +1

    Thank you for the video, your content is easy to follow and quite well explained. I really enjoyed learning the example workflow you presented.

    • @practicalgcp2780
      @practicalgcp2780  6 месяцев назад

      Thank you for the nice words! I am glad you found this useful ❤

  • @zaykov91
    @zaykov91 Год назад +3

    Solid video as always. +1 for a video on setting up Cloud Run with IAP.

    • @practicalgcp2780
      @practicalgcp2780  Год назад

      Thanks Ivan, I will do that one in the next few weeks

  • @lexact1497
    @lexact1497 Год назад +2

    Thanks for the video and for sharing it.

  • @ritwikverma2463
    @ritwikverma2463 2 месяца назад

    Hi Richard, can we create dataproc serverless job in different gcp project using service account?

    • @practicalgcp2780
      @practicalgcp2780  2 месяца назад

      I am not sure I understood you fully, but service account can do anything in any project regardless which project the service account is created from. The way it works is by granting the service account IAM permission from the project you want the job to be created. Then it will work. But it may not be best way to do it as that one service account may have too much permission and scope. You can use separate service account, one for each project if you want to reduce scope, or have a master one to impersonate as other service account in those project but keep in mind it’s key to reduce scope of what each service account can do, otherwise when there is a breach, it can be massive damage on everything all together.