Insightful, thank you for uploading. We can avoid min window function in last problem. Solution: windowSpec = Window.partitionBy("user_id").orderBy("login_date") lead_df = input_df.withColumn( "next_date", f.lead("login_date").over(windowSpec) ).orderBy("user_id", "login_date") result = lead_df.select("user_id").filter( f.date_diff(f.col("next_date"), f.col("login_date")) == 1 ) result.show()
Insightful, thank you for uploading.
We can avoid min window function in last problem.
Solution:
windowSpec = Window.partitionBy("user_id").orderBy("login_date")
lead_df = input_df.withColumn(
"next_date",
f.lead("login_date").over(windowSpec)
).orderBy("user_id", "login_date")
result = lead_df.select("user_id").filter(
f.date_diff(f.col("next_date"), f.col("login_date")) == 1
)
result.show()
Great series, can you upload more such videos with complex problems and bigger datasets
Please check the other video from the same playlist. We have uploaded near to 4 videos