Advancing Spark - Understanding the Spark UI

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

The FULL Guide To Get Fully AWAKENED Draco Race V4 (V1, V2 & V3) | Blox Fruits

NEW DRAGON HUNTER NPC FULL GUIDE | DRAGON HEART QUEST? | Blox Fruits...

SIDEMEN AMONG US MAGE ROLE: CAST A LIGHTNING STRIKE TO WIN

Advancing Spark - Crazy Performance with Spark 3 Adaptive Query Execution

Advancing Analytics

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 янв 2025

Комментарии • 21

@JustLikeBuildingThings 5 месяцев назад
One thing to note at least what I was told on the dbx performance course is that when AQE is on, shuffle.partitions setting still acts as your ceiling.
@carltonpatterson5539 4 года назад ⁺⁵
This is the most enjoyable video on Spark Execution plans - I never thought in my wildest dreams I would enjoy listening to someone talk about data skew, execution plans and query engines.
I definitely need to follow this guy
@carltonpatterson5539 4 года назад
Oops!, I should not have said engine ... I meant execution :-)
@mohitsanghai5455 3 года назад
Wow...Just Perfect...Subscribed.
@akhilannan 4 года назад
Exactly what I needed! Thanks for this video! I was struggling with building a big datamart with lots of joins to huge tables (600m +). Even with a 8 node 56GB cluster, it didn't finish after 1.5hrs and I had to cancel the query. As per explain plan it was doing lots of Sort Mearge joins, and execution hangs with 1 executor doing all the work. I even tried computing table /column stats still didn't help. With Adaptive Query Execution the SQL ran in 28min!!!
@AdvancingAnalytics 4 года назад ⁺¹
Fantastic news - glad it helped! I expect we'll see quite a few similar stories as people discover the config option, switch it on and watch their queries suddenly speed up! Definitely need to investigate to see if there are any potential downsides, but looks like a winner so far!
Simon
@GhernieM 3 года назад ⁺¹
@@AdvancingAnalytics Hi, have you found any downsides after 9 months of use?
@Dreamer-xj3ms 6 месяцев назад
Great video.
Do I need to turn it on manually in 2024.
@AdvancingAnalytics 6 месяцев назад
Nope - turned on by default since Spark 3.2.0 (released end of 2021)!
@shakezi 2 года назад
Thanks for the video!
Any idea why Spark AQE creates multiple jobs, instead of multiple stages in the past...?
You've mentioned the "jobs" but with no explanation... :)
@mohitsanghai5455 3 года назад
@Advanced Analytics Can you make a video on Dynamic Partition Pruning in Spark ? It is quite confusing
@AdvancingAnalytics 3 года назад
Yep, there's a DPP vid from a while ago - ruclips.net/video/-86iMCKeYxI/видео.html.
Let us know if that answers your questions!
@mohitsanghai5455 3 года назад
@@AdvancingAnalytics Yes..that was a great video.
@mohitsanghai5455 3 года назад
@@AdvancingAnalytics I have added few questions in the DPP vid. Can u please have a look at them
@richardcarpenter6775 4 года назад
Great video and thanks for sharing!! I'm currently using Delta Lake on Databricks 2.4.5 and considering migrating to 3.x, do you know if AQE will bring speed enhancements when working with Delta files?
@AdvancingAnalytics 4 года назад
Absolutely - the stats collected by Delta gives you some speed boosts from dynamic partition pruning in Spark 3, which is a huge boost in itself. AQE should help anytime there's a shuffle/exchange - so if you're joining tables, performing wide transformations etc, you should see the benefits.
Only gotcha I found when jumping up from spark 2.4 to 3.x was the shift in date compatibility (ie: strict case sensitivity on date format strings). Otherwise, it was a pain-free upgrade with a few nice performance gains.
Simon
@richardcarpenter6775 4 года назад
@@AdvancingAnalytics Thanks for this, much appreciated.
@radhsri 4 года назад
Hi, what exactly is 26 here? I get the 200 default partitions that the clusters create with the data, but not sure of what the "26" is..thank you..
@AdvancingAnalytics 4 года назад ⁺¹
Hey - Been a while since I ran the demo, but from what I recall the "sales" table that I'd saved down was from a dataframe with 26 RDD blocks, so had created 26 parquet files within the table. In any of the examples going back to that table, it needs 26 tasks to read each of those files, before we can do any of the actual shuffle/exchange steps that AQE is actually helping us with.
Simon
@WKhan-fh2pp 3 года назад
I have a window operation on 8M records and even AQE is enabled but it is causing data skewness issue, any idea how to resolve it.
@AdvancingAnalytics 3 года назад ⁺¹
That's always gonna be an "it depends" :)
If the window is partitioned by a skewed key, and there's no way you can break the partition down another way, you're going to get elements of skew during the stage that performs that transform. You could potentially use bucketing to collect some of the smaller partitions together - but if the issue is around one partition being way too big, it's not going to help too much.
That said, 8M records isn't a crazy amount of data to just brute force through, unless it's an issue trying to serve data directly into dashboards etc that needs the faster execution time? If that's the case, you can always go for a larger worker size, but fewer nodes, so at least you're not hitting any memory boundaries?

Следующие

Автовоспроизведение

Advancing Spark - Understanding the Spark UI

Advancing Spark - Understanding the Spark UI

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

The FULL Guide To Get Fully AWAKENED Draco Race V4 (V1, V2 & V3) | Blox Fruits

The FULL Guide To Get Fully AWAKENED Draco Race V4 (V1, V2 & V3) | Blox Fruits

NEW DRAGON HUNTER NPC FULL GUIDE | DRAGON HEART QUEST? | Blox Fruits...

NEW DRAGON HUNTER NPC FULL GUIDE | DRAGON HEART QUEST? | Blox Fruits...

SIDEMEN AMONG US MAGE ROLE: CAST A LIGHTNING STRIKE TO WIN

SIDEMEN AMONG US MAGE ROLE: CAST A LIGHTNING STRIKE TO WIN

Superman - Teaser Trailer Tomorrow

Superman - Teaser Trailer Tomorrow

26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning

26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning

How to Read Spark Query Plans | Rock the JVM

How to Read Spark Query Plans | Rock the JVM

Advancing Spark - Rethinking ETL with Databricks Autoloader

Advancing Spark - Rethinking ETL with Databricks Autoloader

25 AQE aka Adaptive Query Execution in Spark | Coalesce Shuffle Partitions | Skew Partitions Fix

25 AQE aka Adaptive Query Execution in Spark | Coalesce Shuffle Partitions | Skew Partitions Fix

Advancing Spark - How to pass the Spark 3.0 accreditation!

Advancing Spark - How to pass the Spark 3.0 accreditation!

From Query Plan to Performance: Supercharging your Apache Spark Queries using the Spark UI SQL Tab

From Query Plan to Performance: Supercharging your Apache Spark Queries using the Spark UI SQL Tab

Advancing Spark - Bloom Filter Indexes in Databricks Delta

Advancing Spark - Bloom Filter Indexes in Databricks Delta

Broadcast Joins & AQE (Adaptive Query Execution)

Broadcast Joins & AQE (Adaptive Query Execution)

Advancing Spark - Databricks Delta Change Feed

Advancing Spark - Databricks Delta Change Feed

КТО ЛУЧШЕ ПЕРЕКРИЧАЛ?😂

КТО ЛУЧШЕ ПЕРЕКРИЧАЛ?😂

تجربة صيد الكنوز في الماء بأكبر مغناطيس ـ وهذا الذي وجته 🔫😳

تجربة صيد الكنوز في الماء بأكبر مغناطيس ـ وهذا الذي وجته 🔫😳

Do you remember this game? 😮🦑 #squidgame

Do you remember this game? 😮🦑 #squidgame

ДОТРАКИЕЦ В ПОИСКАХ СВОЕГО БРАТА #большоешоу #11сезон #серия2 #ревва #макарена #mediumquality

ДОТРАКИЕЦ В ПОИСКАХ СВОЕГО БРАТА #большоешоу #11сезон #серия2 #ревва #макарена #mediumquality

АЗИЯ 2! МЕСТЬ "ГОРОДСКИМ ЖИТЕЛЯМ" в РАСТ RUST

АЗИЯ 2! МЕСТЬ "ГОРОДСКИМ ЖИТЕЛЯМ" в РАСТ RUST

Плюсы беременности в Южной Корее 🤰🏻😮 #корея #беременность #дети #путешествия #shorts

Плюсы беременности в Южной Корее 🤰🏻😮 #корея #беременность #дети #путешествия #shorts

TSB KJ DIVE!! lyrics edit #roblox #tsb #thestrongestbattlegrounds #kj

TSB KJ DIVE!! lyrics edit #roblox #tsb #thestrongestbattlegrounds #kj

Первый смартфон 2025 года!

Первый смартфон 2025 года!