Sports Data Analysis using PySpark - Part 02

Поделиться
HTML-код
  • Опубликовано: 29 ноя 2024
  • НаукаНаука

Комментарии • 3

  • @maazahmedansari4334
    @maazahmedansari4334 6 месяцев назад +1

    Insightful, thank you for uploading.
    We can avoid min window function in last problem.
    Solution:
    windowSpec = Window.partitionBy("user_id").orderBy("login_date")
    lead_df = input_df.withColumn(
    "next_date",
    f.lead("login_date").over(windowSpec)
    ).orderBy("user_id", "login_date")
    result = lead_df.select("user_id").filter(
    f.date_diff(f.col("next_date"), f.col("login_date")) == 1
    )
    result.show()

  • @siddheshchavan2069
    @siddheshchavan2069 5 месяцев назад +2

    Great series, can you upload more such videos with complex problems and bigger datasets

    • @TheBigDataShow
      @TheBigDataShow  5 месяцев назад

      Please check the other video from the same playlist. We have uploaded near to 4 videos