Spark Scenario Based Question | Alternative to df.count() | Use Case For Accumulators | learntospark
HTML-код
- Опубликовано: 11 янв 2025
- In this video, we will discuss about the scenario based question on Spark. We will see an alternative method to find the count of dataframe in Spark application. We will learn about the use case for accumulator using this Demo.
DataSet:
github.com/aza...
Blog link to learn more on Spark:
www.learntospark.com
Linkedin profile:
/ azarudeen-s-83652474
FB page:
/ learntospark-104523781...
Can you please make these kind of videos more frequently..
Well explanation 👏 👍
please provide databricks interview Q & A
interviews Q & A are lined up this year. Stay tuned. Hope you find it useful. Thanks
Can you make video on the spark partition how spark decide no of partitions while reading the data and while doing shuffle operation and how to decide the no the partitions and how to change the partitions.
Bro can you share in spark scala and groupBy need to give all columns to find duplicates, when I try to use header inside groupBy not working, so I have give all column header separately within double quotes. Any solution?
You are taking all the data to the driver , tat will create memory issues right. What about listeners to get the metrics?
We are not taking data to the memory, we take only count to the memory. Listeners are also evolving, let me check and update.. Thanks for comments
Thank you
Thanks for all ur support 😊
**IMP**
Hi.. are you giving any training to people who need real-time knowledge and exp? if so please let me know how to get in touch with you?
We could also do:
sum(df.rdd.glom().map(len).collect())