Hi Pritish, thank you for liking the content. Here is an example or sorting by desc order df.sort(df.department.asc() ,df.state.desc()).show(truncate=False) df.sort(col("department").asc(), col("state").desc()).show(truncate=False) df.orderBy(col("department").asc(), col("state").desc()).show(truncate=False)
Define partitioning criteria: Determine the criteria for partitioning your dataset, such as separating training and testing data. Split the dataset: Apply the partitioning criteria to split the dataset into subsets. You can use the DataFrame API's randomSplit function to achieve this: train_ratio = 0.8 # proportion of data for training test_ratio = 1 - train_ratio # proportion of data for testing train_df, test_df = df.randomSplit([train_ratio, test_ratio], seed=42) train_df.write.format("csv").option("header", "true").mode("overwrite").save("train_dataset.csv") test_df.write.format("csv").option("header", "true").mode("overwrite").save("test_dataset.csv")
Hi Amir, , Great content. One doubt: can you please share how to do the sorting with desc order. Thanks.
Hi Pritish, thank you for liking the content. Here is an example or sorting by desc order
df.sort(df.department.asc() ,df.state.desc()).show(truncate=False)
df.sort(col("department").asc(), col("state").desc()).show(truncate=False)
df.orderBy(col("department").asc(), col("state").desc()).show(truncate=False)
Hi Pritish, here is the link for the dataset emp_data.csv for windows function
Github: github.com/siddiquiamir/Data/blob/master/emp_data.csv
@@StatsWire thanks a lot Amir.
@@pritishbanerjee5517 You're welcome Pritish
Do we have any function to find mode of particular column?
We have to write custom function for that.
❤❤
Thank you
Nice
Thank you
How to do partitioning in csv dataset ?
Define partitioning criteria:
Determine the criteria for partitioning your dataset, such as separating training and testing data.
Split the dataset:
Apply the partitioning criteria to split the dataset into subsets. You can use the DataFrame API's randomSplit function to achieve this:
train_ratio = 0.8 # proportion of data for training
test_ratio = 1 - train_ratio # proportion of data for testing
train_df, test_df = df.randomSplit([train_ratio, test_ratio], seed=42)
train_df.write.format("csv").option("header", "true").mode("overwrite").save("train_dataset.csv")
test_df.write.format("csv").option("header", "true").mode("overwrite").save("test_dataset.csv")
How to get acces for those
You will get access to all the videos. Those are scheduled videos they will go public one by one then you can access all the videos.
Some are private
You will get access to all the videos. Those are scheduled videos they will go public one by one then you can access all the videos.