Spark Performance Tuning | Handling DATA Skewness | Interview Question

Spark Interview Questions | Spark Context Vs Spark Session

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Tion Wayne x Russ Millions - We Won (Official Music Video)

TRANSFORMERS ONE | Official Trailer 2 (2024 Movie) - Chris Hemsworth, Brian Tyree Henry

Ice Spice - BB Belt (Audio)

Spark Interview Question | Bucketing | Spark SQL

TechWithViresh

Просмотров 14 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 30 апр 2020
#Apache #Spark #SparkSQL #Bucketing
Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more
Click here to subscribe : / @techwithviresh
About us:
We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.
Mastering Spark : • Spark Scenario Based I...
Mastering Hive : • Mastering Hive Tutoria...
Spark Interview Questions : • Cache vs Persist | Spa...
Mastering Hadoop : • Hadoop Tutorial | Map ...
Visit us :
Email: techwithviresh@gmail.com
Facebook : / tech-greens
Twitter : @TechViresh
Thanks for watching
Please Subscribe!!! Like, share and comment!!!!

Комментарии • 29

@vishalaaa1 Год назад ⁺¹
nice
@cajaykiran 2 года назад
Thank you
@eknathsatish7502 3 года назад ⁺¹
Excellent..
@TechWithViresh 3 года назад
Thanks :)
@gauravbhartia7543 4 года назад ⁺¹
Nicely Explained.
@TechWithViresh 4 года назад
Thanks:)
@aashishraina2831 3 года назад ⁺¹
excellent
@TechWithViresh 3 года назад
Thanks :)
@RAVIC3200 4 года назад ⁺²
Again great content video, Viresh can you make video on those scenarios which interviewer usually ask like -
1) if you have 1TB of file how much time it takes to process (you take any standard cluster setup configuration to explain) and if i reduce to 500GB then how much time it will take.
2) DAG related scenarios questions ?
3) If spark job failed in middle then, will it start from starting if you re-trigger it again ? if not then why?
4) checkpoint related.
Please try to cover such scenarios, if its inside one video then it will be really helpful.. thanks again for such videos.....
@TechWithViresh 4 года назад
Thanks, don’t forget to subscribe.
@RAVIC3200 4 года назад ⁺¹
@@TechWithViresh I'm your permanent viewer 🙏🙏
@dipanjansaha6824 4 года назад
When we directly write to adls i.e the files then how bucketing helps?
2. Also is that a correct understanding bucketing is good when we use a datafram for read purpose only.. as what I understood if there's a use case where in every build write operation happens.. bucketing would not be the best approach..
@TechWithViresh 4 года назад
Yes, bucketing is more effective in reusable tables involved in heavier joins
@bhushanmayank 3 года назад
How does spark know that other table attribute is identical on which it is bucketed while joining?
@gunishjha4030 3 года назад
Great content!!!, You have used bucketBy in scala code to do the changes, can you tell how to handle the same in spark sql as well. do we have any function we can pass in spark sql for the same.
@gunishjha4030 3 года назад
found it thanks anyways PARTITIONED BY (favorite_color)
CLUSTERED BY(name) SORTED BY (favorite_numbers) INTO 42 BUCKETS;
@mdfurqan Год назад ⁺¹
@@gunishjha4030 but are u able to insert the data in bucketed table using spark-sql underlaying storage is Hive?
@SpiritOfIndiaaa 2 года назад
Can you please the share the notebook URL please ? thanks a lot , really gr8 learnings
@mateen161 4 года назад
Nice explanation!...Just wondering how the number of buckets should be decided. In this example, you had used 4 buckets, can't we use 6 or 8 or 10. Is there a specific reason for using 4 buckets ?
@TechWithViresh 4 года назад
It can be any number, depending on your data and bucket column
@sachink.gorade8209 4 года назад
Hello Viresh sir, Nice explaination. Just one thing I did not understand when we create 8 partitions for these two tables as I could not find any code for it in video. So could you please explain?
@TechWithViresh 4 года назад ⁺¹
8 is the default partitions(round robin) created for the cluster used here with 8 nodes.
@TechWithViresh 4 года назад
8 is the default number of partitions (round robin) as the cluster used has 8 nodes
@cajaykiran 2 года назад ⁺¹
Is there anyway I can reach out to you to discuss something important?
@TechWithViresh 2 года назад
Send the details at techwithviresh@gmail.com.
@himanshusekharpaul476 4 года назад
Hey ..Nice explanation ..But here i have one doubt ... in above vedio you have given no of bucket is 4 . What are the criteria we should keep in mind while deciding no of bucket in real time.?? Is there any formula or bucket size constraints ??? Could you please help ??
@TechWithViresh 4 года назад
The idea behind these two data distribution techniques- partition and bucket is to have data distribution evenly and in such optimum size , which can be effectively processed in a single task
@himanshusekharpaul476 4 года назад
Ok.. What is the optimum bucket size that can be processed by single task??
@aashishraina2831 3 года назад
i think this video is reapted above. can be deleted.

Следующие

Автовоспроизведение

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Spark Interview Questions | Spark Context Vs Spark Session

Spark Interview Questions | Spark Context Vs Spark Session

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Tion Wayne x Russ Millions - We Won (Official Music Video)

Tion Wayne x Russ Millions - We Won (Official Music Video)

TRANSFORMERS ONE | Official Trailer 2 (2024 Movie) - Chris Hemsworth, Brian Tyree Henry

TRANSFORMERS ONE | Official Trailer 2 (2024 Movie) - Chris Hemsworth, Brian Tyree Henry

Ice Spice - BB Belt (Audio)

Ice Spice - BB Belt (Audio)

Ryan Reynolds & Hugh Jackman Guest Host Jimmy Kimmel Live

Ryan Reynolds & Hugh Jackman Guest Host Jimmy Kimmel Live

SQL interview questions and answers | Entry level data analyst interview

SQL interview questions and answers | Entry level data analyst interview

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Partition vs bucketing | Spark and Hive Interview Question

Partition vs bucketing | Spark and Hive Interview Question

Spark Interview Question | Clickstream Aanalytics

Spark Interview Question | Clickstream Aanalytics

MongoDB Explained in 10 Minutes | SQL vs NoSQL | Jumpstart

MongoDB Explained in 10 Minutes | SQL vs NoSQL | Jumpstart

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Performance Tuning | Avoid GroupBy | Interview Question

Getting started with SQL - Session 1 | Learn SQL the right way | Trendytech

Getting started with SQL - Session 1 | Learn SQL the right way | Trendytech

Flo Rida - Whistle НА РУССКОМ 😂🔥

Flo Rida - Whistle НА РУССКОМ 😂🔥

Угадайте название трека?🤍 #iribaby #shorts

Угадайте название трека?🤍 #iribaby #shorts

Inside Out 2: Who is the strongest? Joy vs Envy vs Anger #shorts #animation

Inside Out 2: Who is the strongest? Joy vs Envy vs Anger #shorts #animation

СМОЖЕТ ЛИ ЛАМБА ПРОЕХАТЬ МЕГАЛОДОНА В ГТА 5 ? - ЭКСПЕРИМЕНТ В GTA 5

СМОЖЕТ ЛИ ЛАМБА ПРОЕХАТЬ МЕГАЛОДОНА В ГТА 5 ? - ЭКСПЕРИМЕНТ В GTA 5

Оксана Самойлова о ссорах с дочкой Ариелой #интервью

Оксана Самойлова о ссорах с дочкой Ариелой #интервью

Злая Ауди vs Пассат! Оживление корча и заруба на треке!

Злая Ауди vs Пассат! Оживление корча и заруба на треке!

ПОДВОДНЫЙ ГЕЙМИНГ #shorts

ПОДВОДНЫЙ ГЕЙМИНГ #shorts

Your bathroom needs this

Your bathroom needs this