PySpark Tutorial 6: PySpark DataFrame Functions | PySpark with Python

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 17

  • @pritishbanerjee5517
    @pritishbanerjee5517 3 года назад

    Hi Amir, , Great content. One doubt: can you please share how to do the sorting with desc order. Thanks.

    • @StatsWire
      @StatsWire  3 года назад +2

      Hi Pritish, thank you for liking the content. Here is an example or sorting by desc order
      df.sort(df.department.asc() ,df.state.desc()).show(truncate=False)
      df.sort(col("department").asc(), col("state").desc()).show(truncate=False)
      df.orderBy(col("department").asc(), col("state").desc()).show(truncate=False)

    • @StatsWire
      @StatsWire  3 года назад +1

      Hi Pritish, here is the link for the dataset emp_data.csv for windows function
      Github: github.com/siddiquiamir/Data/blob/master/emp_data.csv

    • @pritishbanerjee5517
      @pritishbanerjee5517 3 года назад

      @@StatsWire thanks a lot Amir.

    • @StatsWire
      @StatsWire  3 года назад +1

      @@pritishbanerjee5517 You're welcome Pritish

  • @r.ritika2963
    @r.ritika2963 Год назад

    Do we have any function to find mode of particular column?

    • @StatsWire
      @StatsWire  Год назад

      We have to write custom function for that.

  • @statisticalseminarsdcmeetu8491
    @statisticalseminarsdcmeetu8491 8 месяцев назад

    ❤❤

  • @mazharalamsiddiqui6904
    @mazharalamsiddiqui6904 3 года назад

    Nice

  • @bhaswatibaishya251
    @bhaswatibaishya251 Год назад

    How to do partitioning in csv dataset ?

    • @StatsWire
      @StatsWire  Год назад

      Define partitioning criteria:
      Determine the criteria for partitioning your dataset, such as separating training and testing data.
      Split the dataset:
      Apply the partitioning criteria to split the dataset into subsets. You can use the DataFrame API's randomSplit function to achieve this:
      train_ratio = 0.8 # proportion of data for training
      test_ratio = 1 - train_ratio # proportion of data for testing
      train_df, test_df = df.randomSplit([train_ratio, test_ratio], seed=42)
      train_df.write.format("csv").option("header", "true").mode("overwrite").save("train_dataset.csv")
      test_df.write.format("csv").option("header", "true").mode("overwrite").save("test_dataset.csv")

  • @RangaSwamyleela
    @RangaSwamyleela 3 года назад

    How to get acces for those

    • @StatsWire
      @StatsWire  3 года назад

      You will get access to all the videos. Those are scheduled videos they will go public one by one then you can access all the videos.

  • @RangaSwamyleela
    @RangaSwamyleela 3 года назад

    Some are private

    • @StatsWire
      @StatsWire  3 года назад

      You will get access to all the videos. Those are scheduled videos they will go public one by one then you can access all the videos.