35. Databricks & Spark: Interview Question - Shuffle Partition

22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce

How to handle Data skewness in Apache Spark using Key Salting Technique

Will Freeze Drying + Resin Preserve a Pumpkin Forever?

The M4 Mac Mini is Incredible!

Australia v Pakistan | Second ODI | ODI Series 2024-25

34. Databricks - Spark: Data Skew Optimization

Raja's Data Engineering

Просмотров 29 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 ноя 2024

Комментарии • 33

@Prashanth-os5he Год назад ⁺⁴
This is by far the best databricks and spark tutorial series on youtube... great job Raja
@rajasdataengineering7585 Год назад
Glad you think so! Thanks for your comment
@sumanmondal8836 2 года назад ⁺³
Thanks, Raja, your explanations are really good...can you please make a video on salting techniques with example? It will be very helpful.
@rajasdataengineering7585 2 года назад
Thank you Suman. Sure, will make a video on salting
@abhinavsingh1173 Год назад ⁺²
Your course it
best. But problem with you course is that you are not attching the github link for your sample data and code. Irequest you as your audience please do this. Thanks
@skasifali4457 2 года назад ⁺²
Thanks Raja..Your video is really useful. Can you please create a video on debugging techniques and how we can use spark UI to debug and understand the bottleneck using use cases. Thanks a lot again
@rajasdataengineering7585 2 года назад ⁺²
Sure Asif, will post a video on debugging
@swapnilgosawi 2 месяца назад
Do you have a document with all these details ?if yes, that would be great to share on git., Really Great explanation. Thank you !!
@srinubathina7191 Год назад ⁺¹
Awesome content Thank You So much Sir
@rajasdataengineering7585 Год назад
Glad you liked it
@joyo2122 2 года назад ⁺¹
You are the best Raja 🙌
@iamkiri_ 11 месяцев назад ⁺¹
Thanks for the video, I have a question.. Is salting technique applied while reading the data from source or during intermediate processing of the application..
@rajasdataengineering7585 11 месяцев назад
It is applied during transformation stage, not at data extraction
@iamkiri_ 11 месяцев назад
Thanks Bro
@SaurabhDestiny18 Год назад
Hi Tq for such useful videos, i have one question, i am still confused about executor boundary and cores/tasks boundary. In your first video you mentioned executor can have many cores/ram and then this video you mention executor runs in its own jvm process , which means all the cores/tasks are running under one jvm process? Or under than parent jvm process there are many more jvm process are running which are equal to number of cores/tasks?
@Personalcomments 2 года назад ⁺¹
Your videos are very informative. Can you please post a video on Client mode vs Cluster mode vs local
@rajasdataengineering7585 2 года назад
Sure Merin, will post the video on this topic
@VishalSharma-hv6ks 2 года назад ⁺²
You mainly focus on theoretical. It would be great if you write the code for salting as well.
@rajasdataengineering7585 2 года назад ⁺¹
Sure, will post another video with coding example
@rajunaik8803 Год назад ⁺¹
Hi Raja, QQ - Does AQE take care of salting and skew hint technique automatically in case of data skewness?
Or do we have to explicitly apply them?
@rajasdataengineering7585 Год назад ⁺¹
Yes AQE handles data skewness automatically. In later spark versions after 3.0, it is enabled by default. For prior versions of spark, we just need to enable AQE through spark config settings
@rajunaik8803 Год назад
@@rajasdataengineering7585 thanks alot for your response. Do you have any telegram channel? And may I know your LinkedIn id please
@sravankumar1767 2 года назад ⁺¹
Superb
@rajasdataengineering7585 2 года назад
Thank you
@naveenkumarsingh3829 5 месяцев назад
why cant we use set maxpartitionbytes to get equal size of partitions and handle data skewness?
@tanushreenagar3116 2 года назад ⁺¹
nice
@rajasdataengineering7585 2 года назад
Thanks
@balakrishna61 6 месяцев назад
@rajasdataengineering7585 Please explain salting in detail.It's not clear how you parition the German-1,_2 and so on .Each record will become one partition correct in this case?
@prathapganesh7021 7 месяцев назад ⁺¹
thank you
@rajasdataengineering7585 7 месяцев назад
Welcome!
@sanskarsuman9340 2 года назад ⁺¹
i have doubt:
when u say data is partitioned on country and there are five different countries, out of which lets say Germany has 80% of data, so how can I say that germany data is in single partition only? coz partition is determined on the size of the block and 1 parttion = 128mb size, so depending on its size, germany data could be splitted into multiple partitions automatically?
@ndbweurt34485 Год назад
same question i had
@supriyakoura7755 3 месяца назад
Same question

Следующие

Автовоспроизведение

35. Databricks & Spark: Interview Question - Shuffle Partition

35. Databricks & Spark: Interview Question - Shuffle Partition

22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce

22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce

How to handle Data skewness in Apache Spark using Key Salting Technique

How to handle Data skewness in Apache Spark using Key Salting Technique

Will Freeze Drying + Resin Preserve a Pumpkin Forever?

Will Freeze Drying + Resin Preserve a Pumpkin Forever?

The M4 Mac Mini is Incredible!

The M4 Mac Mini is Incredible!

Australia v Pakistan | Second ODI | ODI Series 2024-25

Australia v Pakistan | Second ODI | ODI Series 2024-25

What if everyone pointed a laser at the moon?

What if everyone pointed a laser at the moon?

23. Databricks | Spark | Cache vs Persist | Interview Question | Performance Tuning

23. Databricks | Spark | Cache vs Persist | Interview Question | Performance Tuning

66. Databricks | Pyspark | Delta: Z-Order Command

66. Databricks | Pyspark | Delta: Z-Order Command

25. Databricks | Spark | Broadcast Variable| Interview Question | Performance Tuning

25. Databricks | Spark | Broadcast Variable| Interview Question | Performance Tuning

75. Databricks | Pyspark | Performance Optimization - Bucketing

75. Databricks | Pyspark | Performance Optimization - Bucketing

04. On-Heap vs Off-Heap| Databricks | Spark | Interview Question | Performance Tuning

04. On-Heap vs Off-Heap| Databricks | Spark | Interview Question | Performance Tuning

26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning

26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning

48. Databricks - Pyspark: Find Top or Bottom N Rows per Group

48. Databricks - Pyspark: Find Top or Bottom N Rows per Group

КТО ЖЕ НАСТОЯЩАЯ МАМА?!😰 Я ДОЛЖЕН УЗНАТЬ ПРАВДУ! 😠 #robloxshorts #roblox #brookhaven

КТО ЖЕ НАСТОЯЩАЯ МАМА?!😰 Я ДОЛЖЕН УЗНАТЬ ПРАВДУ! 😠 #robloxshorts #roblox #brookhaven

Зеленского жестко поимеют

Зеленского жестко поимеют

Корейский ИРП - САМЫЙ БОЛЬШОЙ ДОШИРАК в мире! Они это едят?!

Корейский ИРП - САМЫЙ БОЛЬШОЙ ДОШИРАК в мире! Они это едят?!

НОВЫЙ ГЕРОЙ И 4-Й АКТ | KEZ - САМЫЙ СЛОЖНЫЙ ГЕРОЙ ДОТЫ | РАЗБОР 4-ГО АКТА ПАВШЕЙ КОРОНЫ | DOTA 2

НОВЫЙ ГЕРОЙ И 4-Й АКТ | KEZ - САМЫЙ СЛОЖНЫЙ ГЕРОЙ ДОТЫ | РАЗБОР 4-ГО АКТА ПАВШЕЙ КОРОНЫ | DOTA 2

Выборы в США: конец гонки 🇺🇸 .. Инфляция в России: начало жёсткого сценария || Дмитрий Потапенко*

Выборы в США: конец гонки 🇺🇸 .. Инфляция в России: начало жёсткого сценария || Дмитрий Потапенко*

Мои Наглые Сёстры Требовали Позволить Им Сдавать Мой Дачный Дом. Когда Я Отказался...

Мои Наглые Сёстры Требовали Позволить Им Сдавать Мой Дачный Дом. Когда Я Отказался...

Речь Дональда Трампа по итогам выборов: «беспрецедентный и мощный мандат», «золотой век Америки»

Речь Дональда Трампа по итогам выборов: «беспрецедентный и мощный мандат», «золотой век Америки»

Чего России ждать от Трампа // Поддельная кола // Бедные москвичи

Чего России ждать от Трампа // Поддельная кола // Бедные москвичи