Spark Scenario Based Question | Alternative to df.count() | Use Case For Accumulators | learntospark

Поделиться
HTML-код
  • Опубликовано: 11 янв 2025
  • In this video, we will discuss about the scenario based question on Spark. We will see an alternative method to find the count of dataframe in Spark application. We will learn about the use case for accumulator using this Demo.
    DataSet:
    github.com/aza...
    Blog link to learn more on Spark:
    www.learntospark.com
    Linkedin profile:
    / azarudeen-s-83652474
    FB page:
    / learntospark-104523781...

Комментарии • 12

  • @sudheersatya3206
    @sudheersatya3206 4 года назад +3

    Can you please make these kind of videos more frequently..

  • @sravankumar1767
    @sravankumar1767 2 года назад

    Well explanation 👏 👍

  • @smdimran6960
    @smdimran6960 Год назад +1

    please provide databricks interview Q & A

    • @AzarudeenShahul
      @AzarudeenShahul  Год назад

      interviews Q & A are lined up this year. Stay tuned. Hope you find it useful. Thanks

  • @ravikirantuduru1061
    @ravikirantuduru1061 4 года назад

    Can you make video on the spark partition how spark decide no of partitions while reading the data and while doing shuffle operation and how to decide the no the partitions and how to change the partitions.

  • @maheshk1678
    @maheshk1678 4 года назад

    Bro can you share in spark scala and groupBy need to give all columns to find duplicates, when I try to use header inside groupBy not working, so I have give all column header separately within double quotes. Any solution?

  • @radhakrishnanselvaraj518
    @radhakrishnanselvaraj518 4 года назад +1

    You are taking all the data to the driver , tat will create memory issues right. What about listeners to get the metrics?

    • @AzarudeenShahul
      @AzarudeenShahul  4 года назад

      We are not taking data to the memory, we take only count to the memory. Listeners are also evolving, let me check and update.. Thanks for comments

  • @nagamanickam6604
    @nagamanickam6604 Год назад

    Thank you

  • @jagadeeswarap2125
    @jagadeeswarap2125 4 года назад

    **IMP**
    Hi.. are you giving any training to people who need real-time knowledge and exp? if so please let me know how to get in touch with you?

  • @localmartian9047
    @localmartian9047 2 года назад

    We could also do:
    sum(df.rdd.glom().map(len).collect())