Converting CSV to Nested JSON/ Dictionary format in PySpark | Realtime scenario
HTML-код
- Опубликовано: 6 мар 2022
- Hi Friends,
In this video, I have explained some sample python code to convert csv file and convert the records into JSON format.
github.com/sravanapisupati/Sa...
Code- github.com/sravanapisupati/Sa...
Please subscribe to my channel for more interesting learnings. Наука
Thank you so much for this video 🙏
Nice explanation sravana
Thank you Sravan, hope this is what you were looking for.
@@sravanalakshmipisupati6533 can you please share Email. I will share my user story, actually I don't understand
@@sravankumar1767 Sravana.Pisupati@gmail.com
Thanks - if possible please upload the datasets you are using in the playlist - some are not there.
Sure.
Added files, please check.
thank you.@@sravanalakshmipisupati6533
💣🙏👍
such kinds of Q's will be asked in Interview ? because its purely based on usage of inbuilt functions provided by pyspark.
can you please let me know how to prepare for such kind of Q's for interview ?
Its a real-time scenario. And, yes they might ask this in coding interviews as well. Good Hands-on is the only option for preparing for coding rounds.
How to use that myformat variable to save data to hive table in same format
Hi Prashanth,
Please use the below -
Dataframe.saveAsTable("db_name.hive_table_name")
You can add partitionBy() also if you want.
Dataframe.write.partitionBy("col1").saveAsTable("db.table")
@@sravanalakshmipisupati6533 I need to use that myformat variable(which is list in form of json) to store data in hive using that same format.I hope u understood my question
@@andheprashanth4686 create a dataframe out of myFormat json and then you can write it as hive table.
Dataframe= spark.CreareDataframe(myFormat)
Hello ma'am , how to read PDF files and cleaning data in spark - Scala
Thank you for watching the video. I'll get back to you on this.
Getting __init__ missing positional argument error while performing groups and aggregation on my df
Please share the code which you are trying to execute.
@@sravanalakshmipisupati6533 df=['key','id','name','email']
df1=df.groupby("key").agg(f.map_from_entries(f.collect_list(f.struct("id","name","email"))).alias("info"))
df printing showed good, but while grouping and aggregating am facing that error.
@@ayaanshdas1412 Please create dataframe through SparkSession and try.