Spark Scenario Based Question | Alternative to df.count() | Use Case For Accumulators | learntospark

Azarudeen Shahul

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 янв 2025
In this video, we will discuss about the scenario based question on Spark. We will see an alternative method to find the count of dataframe in Spark application. We will learn about the use case for accumulator using this Demo.
DataSet:
github.com/aza...
Blog link to learn more on Spark:
www.learntospark.com
Linkedin profile:
/ azarudeen-s-83652474
FB page:
/ learntospark-104523781...

Комментарии • 12

@sudheersatya3206 4 года назад ⁺³
Can you please make these kind of videos more frequently..
@sravankumar1767 2 года назад
Well explanation 👏 👍
@smdimran6960 Год назад ⁺¹
please provide databricks interview Q & A
@AzarudeenShahul Год назад
interviews Q & A are lined up this year. Stay tuned. Hope you find it useful. Thanks
@ravikirantuduru1061 4 года назад
Can you make video on the spark partition how spark decide no of partitions while reading the data and while doing shuffle operation and how to decide the no the partitions and how to change the partitions.
@maheshk1678 4 года назад
Bro can you share in spark scala and groupBy need to give all columns to find duplicates, when I try to use header inside groupBy not working, so I have give all column header separately within double quotes. Any solution?
@radhakrishnanselvaraj518 4 года назад ⁺¹
You are taking all the data to the driver , tat will create memory issues right. What about listeners to get the metrics?
@AzarudeenShahul 4 года назад
We are not taking data to the memory, we take only count to the memory. Listeners are also evolving, let me check and update.. Thanks for comments
@nagamanickam6604 Год назад
Thank you
@AzarudeenShahul Год назад
Thanks for all ur support 😊
@jagadeeswarap2125 4 года назад
**IMP**
Hi.. are you giving any training to people who need real-time knowledge and exp? if so please let me know how to get in touch with you?
@localmartian9047 2 года назад
We could also do:
sum(df.rdd.glom().map(len).collect())

Следующие

Автовоспроизведение

Spark Interview Question | Scenario Based | Masking Data with Demo | LearntoSpark

5:16

Spark Interview Question | Scenario Based | Merge DataFrame in Spark | LearntoSpark

17:03

$Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark$ 9:22 $Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark$

I Upgraded to MAX Dragon Fruit in Blox Fruits Update