45. Databricks | Spark | Pyspark | PartitionBy

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 23

  • @SureshBabu-kf5jx
    @SureshBabu-kf5jx 10 месяцев назад +2

    Hi Raja, Canyou let the difference among, Partition by, repartition and shuffle parameter. I remember in the previous videos that we use Repartition while reading and writing dataframe to disk and shuffle parition is to increase or decrease the partitions while suffling the data in transformations. Can you you please clarify me on the same. Thanks

  • @Basket-hb5jc
    @Basket-hb5jc 5 месяцев назад +2

    Best creator on pyspark. Continue doing this

    • @rajasdataengineering7585
      @rajasdataengineering7585  5 месяцев назад +1

      Thank you!

    • @Basket-hb5jc
      @Basket-hb5jc 5 месяцев назад

      @@rajasdataengineering7585 hi I have a doubt. Which operations will make a emr cluster OOM

  • @swapnilgosawi
    @swapnilgosawi Месяц назад

    If possible can you also try to explain if we can update only certain range of partition data. For eg. if the data is partition by month , and i want to update only last 3 months of partition data then how we can achieve that?

  • @DeepakPatel-vc7yr
    @DeepakPatel-vc7yr Год назад +1

    Hi Raja, Thanks for posting all the concepts! have you shared the datasets which you are referring in all lectures ? can we have these datasets please?

  • @sravankumar1767
    @sravankumar1767 3 года назад +3

    very usefulll videos, can please do more videos

  • @gulsahtanay2341
    @gulsahtanay2341 8 месяцев назад +1

    Very useful content

  • @parameshgosula5510
    @parameshgosula5510 3 года назад +1

    Crisp and clear

  • @simanchalmaharana2927
    @simanchalmaharana2927 9 месяцев назад +1

    Please make a detail video on salting techniques and how to do salting

  • @samridhisamridhi6246
    @samridhisamridhi6246 2 года назад

    Hi Raja, while writing the dataframe to dbfs or blob, is there a way in which we can only write the part file and not the system files?

  • @jagadeeswaran330
    @jagadeeswaran330 5 месяцев назад +1

    Nice sir!

  • @vineethreddy.s
    @vineethreddy.s 2 года назад

    If i read this partitioned data, the columns on which the partition has been done are coming at last and there by schema is changing. Is there a way to preserve the schema?

  • @kaminipriya9835
    @kaminipriya9835 11 месяцев назад +1

    Hi Sir, May i know the difference between partitionBy and repartition it's a bit confusing.

    • @rajasdataengineering7585
      @rajasdataengineering7585  11 месяцев назад

      Hi Kamini, partitionby and repartition both are completely different. Partitionby is used while writing a dataframe into a storage system. For each key new folder would be created in the storage location .
      Repartition is used to reduce or increase number of partitions within spark memory while applying any transformation

    • @kaminipriya9835
      @kaminipriya9835 11 месяцев назад +1

      @@rajasdataengineering7585 thanks for the reply much needed :)

    • @rajasdataengineering7585
      @rajasdataengineering7585  11 месяцев назад

      Welcome!

  • @aperez1969
    @aperez1969 2 года назад +1

    Good work Raja!

  • @omkargurme20
    @omkargurme20 9 месяцев назад

    How to create weekly partitions?