Spark Structured Streaming | Spark Scenario Based Questions | Using Spark with Scala

Spark Interview Question | Scenario Based | Merge DataFrame in Spark | LearntoSpark

22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce

Cris MJ, FloyyMenor, LOUKI - Después De La Una (Video Oficial) | MJ

Minecraft but I become the HERO of PVP CIVILIZATION

Breaking Down My NFL Highlights with Julian Edelman

Apache Spark | Spark Interview Question | Spark Optimization { PartitionBy & Repartition }

Azarudeen Shahul

Просмотров 19 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 15 сен 2024
Apache Spark | Spark Interview Question | Spark Optimization { PartitionBy & Repartition }
Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
⭐ Kite is a free AI-powered coding assistant for Python that will help you code smarter and faster. Integrates with Atom, PyCharm, VS Code, Sublime, Vim, and Spyder. I've been using Kite for 6 months and I love it! www.kite.com/g....
The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you're typing. I've been using Kite for 6 months, and I love it!
-----------------------------------------------------------
Apache Spark Interview Question - In this video, we will learn to answer the interview question on What is the Difference between partitionBy and repartition in Apache Spark. We will understand this Spark optimization techniques with small demo.
-----------------------------------------------------------
DataSet for you to work:
github.com/aza...
Blog link to learn more on Spark:
www.learntospark.com
Linkedin profile:
/ azarudeen-s-83652474
FB page:
/ learntospark-104523781...
#apachespark #spark #sparkoptimization

Комментарии • 25

@akashprabhakar6353 3 месяца назад
Awesome video. Greatly explained
@nareshvemula2204 3 года назад ⁺¹
Good work azar. I gone through your list of videos from last one year, lot of videos are there. Could you please make one consolidated series of videos for big data engineer interview questions and answers starting from hdfs, sqoop, hadoop, apache spark, hive, impala etc. Similarly one more series of lectures for beginners to understand each concept in detailed. If this is very big thing, could you please provide the list of sources and learning path and provide a clear cut strategy and resources for interviews and as well as for learning purpose and create separate play lists for it.
@occasionalvisitor6920 4 месяца назад
Hi, you should clearly explain as why the source file after reading shows only 1 partition. simply telling that the file is small does not gives clarity to the audience. Technically there are lot of factors which can attribute to the decision of how many partitions will be created such as Parallelism, Block Size, Input Format and Splitting Logic, Data Locality and Cluster Configuration etc
@vipinkumarjha5587 3 года назад
Superly explained...Thanks for sharing the knowledge
@prajjwaljaglan1092 3 года назад
Very informative and helpful videos... Thanks for sharing the knowledge 👍👍
@AzarudeenShahul 3 года назад
Thanks for your support :)
@Balajionceagain 3 года назад
Thanks Azar. It’s really helpful 👍
@Humanist1199 3 года назад
Excellent Azar
@dhananjayreddy9998 2 года назад
After watching the complete video, I understood that repartition should be used @ reading the file and PartitionBy should be used @ Writing. Repartition actually divides the data into the number given @ repartition.
But didn't get clarity on Partitionby . Could someone please explain on this
@sravankumar1767 2 года назад
Superb
@ferrerolounge1910 Год назад
why do we need files in memory? (repartition use). Isn't file always used when storing in disk?
@deepjyotimitra1340 2 года назад
very nice explanation. 👌 👍
@AzarudeenShahul 2 года назад
Thanks for your support
@bikersview9926 2 года назад
voice was so good azar
@SpiritOfIndiaaa 2 года назад
Thank you, please share note books
@shilpasthavarmath5262 3 года назад
Usefull..
@153dravid 2 года назад
Hi Azar, which is faster (Repartition vs PartitionBy vs Coalesce) if we are dealing with 1 TB data?
Thank you
@localmartian9047 2 года назад ⁺¹
They do different things. partitionBy is for writing the df into separate subdirectories based on partition column. While repartition and colasce deal with distribution of data inmemory among executors. coalesce is used to reduce partitions and tries to avoid full reshuffling, so it will be faster than repartition which can both increase or decrease partitions but does it with full reshuffle. But if you decrease the partition too much than capacity of executor, it can lead to OOM. So it depends on the problem you are solving ie you want to reduce skewness or distribute more
@creativeminds7397 3 года назад ⁺¹
Hello Azarudeen, Your vedios are awesome.
I have one question can you please provide me code .. 1) I want decrypt the file using private key .My all files PGP encrypted file and private key stored in S3 bucket.. please help me to provide the code.
@AzarudeenShahul 3 года назад
If possible can u pls share some sample file..
@ayyappaappu7265 3 года назад
Bro partitionby is action or transaction
@ankbala 2 года назад ⁺¹
partitionBy is not related to RDD. It's neither action nor transformation. Right Azar?
@dataengineering3304 2 года назад
parititionby is a method relating to DataFrameWriter and is not related to dataframe, so it is neither action nor transformation.

Следующие

Автовоспроизведение

Spark Structured Streaming | Spark Scenario Based Questions | Using Spark with Scala

Spark Structured Streaming | Spark Scenario Based Questions | Using Spark with Scala

Spark Interview Question | Scenario Based | Merge DataFrame in Spark | LearntoSpark

Spark Interview Question | Scenario Based | Merge DataFrame in Spark | LearntoSpark

22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce

22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce

Cris MJ, FloyyMenor, LOUKI - Después De La Una (Video Oficial) | MJ

Cris MJ, FloyyMenor, LOUKI - Después De La Una (Video Oficial) | MJ

Minecraft but I become the HERO of PVP CIVILIZATION

Minecraft but I become the HERO of PVP CIVILIZATION

Breaking Down My NFL Highlights with Julian Edelman

Breaking Down My NFL Highlights with Julian Edelman

coalesce vs repartition vs partitionBy in spark | Interview question Explained

coalesce vs repartition vs partitionBy in spark | Interview question Explained

Apache Spark | Spark Scenario Based Question | Data Skewed or Not ? | Count of Each Partition in DF

Apache Spark | Spark Scenario Based Question | Data Skewed or Not ? | Count of Each Partition in DF

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

75. Databricks | Pyspark | Performance Optimization - Bucketing

75. Databricks | Pyspark | Performance Optimization - Bucketing

Shuffle Partition Spark Optimization: 10x Faster!

Shuffle Partition Spark Optimization: 10x Faster!

Spark - Repartition Or Coalesce

Spark - Repartition Or Coalesce

Spark Interview Question | Speculative Execution in Spark | With Demo | LearntoSpark

Spark Interview Question | Speculative Execution in Spark | With Demo | LearntoSpark

Broadcast vs Accumulator Variable - Broadcast Join & Counters - Apache Spark Tutorial For Beginners

Broadcast vs Accumulator Variable - Broadcast Join & Counters - Apache Spark Tutorial For Beginners

Pyspark Scenarios 20 : difference between coalesce and repartition in pyspark #coalesce #repartition

Pyspark Scenarios 20 : difference between coalesce and repartition in pyspark #coalesce #repartition

Сделала 5 банок, муж сказал, надо 25! Заготовки для ХАРЧО и для СОЛЯНКИ на зиму! СУПЫ в банках!

Сделала 5 банок, муж сказал, надо 25! Заготовки для ХАРЧО и для СОЛЯНКИ на зиму! СУПЫ в банках!

ЕСЛИ ЭТИ БАНКИ ВЫЖИВУТ, Я СЪЕМ ЛОЖКУ МАЙОНЕЗА! 🤮 #Shorts #Глент

ЕСЛИ ЭТИ БАНКИ ВЫЖИВУТ, Я СЪЕМ ЛОЖКУ МАЙОНЕЗА! 🤮 #Shorts #Глент

Bamboo Creations with Mini Bamboo Slingshots #bamboo #bamboocrafts #bambooart #Diy

Bamboo Creations with Mini Bamboo Slingshots #bamboo #bamboocrafts #bambooart #Diy

БЕЛКА РОЖАЕТ? #cat

БЕЛКА РОЖАЕТ? #cat

Моя Бывшая - Зомби Вернулась!

Моя Бывшая - Зомби Вернулась!

Ютуб - всё. Буханка - всё. Мустанг - всё. Новый гараж и проекты. Новый сезон

Ютуб - всё. Буханка - всё. Мустанг - всё. Новый гараж и проекты. Новый сезон

World’s Tallest Man VS Shortest Woman!

World’s Tallest Man VS Shortest Woman!

How ice hockey pitches are set up! 😮🏒 - 🎥 thoskins77

How ice hockey pitches are set up! 😮🏒 - 🎥 thoskins77