Run Apache Spark jobs on serverless Dataproc
HTML-код
- Опубликовано: 6 сен 2024
- Today, I'm excited to share a hands-on example of using a custom container to bundle all Spark job dependencies and execute it on serverless Dataproc. This powerful feature provides a streamlined approach for running Spark jobs without managing any infrastructure, while still offering advanced features like fine-tuning autoscaling-all without incurring the cost of a constantly running cluster. #ApacheSpark #GoogleCloud #Serverless #Dataproc #BigData
00:17 - Table of Contents
01:19 - What is Dataproc?
01:53 - Dataproc vs serverless Dataproc
03:52 - Custom containers on Dataproc
08:14 - A real-world use case
11:33 - Code walk through
20:43 - See it in Action!
25:55 - Summary
Useful links
- code: github.com/roc...
- slides: docs.google.co...
- custom container: cloud.google.c...
- serverless vs compute engine: cloud.google.c...
- spark submit via REST: cloud.google.c...
- service to service communication: cloud.google.c...
Hi Richard, Love your content, always wanted someone to do GCP training videos emphasizing real world use cases, I work in Bigquery and Composer, I wanted to learn dataproc and dataflow. But everywhere i see same type of trainings not much focusing on real world implementations. I wanted to learn how dataproc and dataflow jobs are deployed in different environments like dev test and prod, your videos are helping a lot, hope you will do more videos on dataflow and dataproc, how we use this in real projects in how we create these jobs using CICD
No worries glad you found this useful ❤
@@practicalgcp2780 I have one doubt, in an organization if we have many dataproc jobs how will we create it in different environments like dev test and prod, can you please do a video on that
Awesome presentation! Far better than so much other, mostly self-promo, content
Thanks so much ❤ glad you found it useful. The goal of this channel is to showcase ideas that can actually work well to solve real world problems.
thank you. This is really clear and well articulated. Hard to find on youtube Data Eng stuff
Thank you for the video, your content is easy to follow and quite well explained. I really enjoyed learning the example workflow you presented.
Thank you for the nice words! I am glad you found this useful ❤
Solid video as always. +1 for a video on setting up Cloud Run with IAP.
Thanks Ivan, I will do that one in the next few weeks
Thanks for the video and for sharing it.
Hi Richard, can we create dataproc serverless job in different gcp project using service account?
I am not sure I understood you fully, but service account can do anything in any project regardless which project the service account is created from. The way it works is by granting the service account IAM permission from the project you want the job to be created. Then it will work. But it may not be best way to do it as that one service account may have too much permission and scope. You can use separate service account, one for each project if you want to reduce scope, or have a master one to impersonate as other service account in those project but keep in mind it’s key to reduce scope of what each service account can do, otherwise when there is a breach, it can be massive damage on everything all together.