Converting CSV to Nested JSON/ Dictionary format in PySpark | Realtime scenario

Поделиться
HTML-код
  • Опубликовано: 6 мар 2022
  • Hi Friends,
    In this video, I have explained some sample python code to convert csv file and convert the records into JSON format.
    github.com/sravanapisupati/Sa...
    Code- github.com/sravanapisupati/Sa...
    Please subscribe to my channel for more interesting learnings.
  • НаукаНаука

Комментарии • 22

  • @svsphaneendra
    @svsphaneendra Год назад +1

    Thank you so much for this video 🙏

  • @sravankumar1767
    @sravankumar1767 2 года назад

    Nice explanation sravana

  • @soumyakantarath4551
    @soumyakantarath4551 6 месяцев назад +1

    Thanks - if possible please upload the datasets you are using in the playlist - some are not there.

  • @ardataengineering6521
    @ardataengineering6521 Год назад

    💣🙏👍

  • @ambadasshinde6670
    @ambadasshinde6670 2 года назад +1

    such kinds of Q's will be asked in Interview ? because its purely based on usage of inbuilt functions provided by pyspark.
    can you please let me know how to prepare for such kind of Q's for interview ?

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  2 года назад +1

      Its a real-time scenario. And, yes they might ask this in coding interviews as well. Good Hands-on is the only option for preparing for coding rounds.

  • @andheprashanth4686
    @andheprashanth4686 Год назад +1

    How to use that myformat variable to save data to hive table in same format

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  Год назад

      Hi Prashanth,
      Please use the below -
      Dataframe.saveAsTable("db_name.hive_table_name")
      You can add partitionBy() also if you want.
      Dataframe.write.partitionBy("col1").saveAsTable("db.table")

    • @andheprashanth4686
      @andheprashanth4686 Год назад

      @@sravanalakshmipisupati6533 I need to use that myformat variable(which is list in form of json) to store data in hive using that same format.I hope u understood my question

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  Год назад

      @@andheprashanth4686 create a dataframe out of myFormat json and then you can write it as hive table.
      Dataframe= spark.CreareDataframe(myFormat)

  • @rajdeepsinghborana2409
    @rajdeepsinghborana2409 2 года назад

    Hello ma'am , how to read PDF files and cleaning data in spark - Scala

  • @ayaanshdas1412
    @ayaanshdas1412 Год назад

    Getting __init__ missing positional argument error while performing groups and aggregation on my df

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  Год назад

      Please share the code which you are trying to execute.

    • @ayaanshdas1412
      @ayaanshdas1412 Год назад

      @@sravanalakshmipisupati6533 df=['key','id','name','email']
      df1=df.groupby("key").agg(f.map_from_entries(f.collect_list(f.struct("id","name","email"))).alias("info"))
      df printing showed good, but while grouping and aggregating am facing that error.

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  Год назад +1

      @@ayaanshdas1412 Please create dataframe through SparkSession and try.