Cleansing the CSV data and processing in Pyspark| Scenario based question| Spark Interview Questions

Поделиться
HTML-код
  • Опубликовано: 4 фев 2022
  • Hi Friends,
    Sample code is checked into GitHub:
    github.com/sravanapisupati/Sa...
    In this video, I have explained the procedure for reading a csv file and processing it using PySpark.
    The CSV has multiple lines present for a single Id and has uneven columns ( different number of columns for each row).
    Please subscribe to my channel for more interesting learnings.
  • НаукаНаука

Комментарии • 31

  • @yashwantdhole7645
    @yashwantdhole7645 3 месяца назад

    Excellent

  • @sudippandit1
    @sudippandit1 2 года назад +2

    Your tutorials are simply special Sravana!!

  • @sravankumar1767
    @sravankumar1767 2 года назад +2

    Superb, everyone can easily understand 👍 👏

  • @deathseal9844
    @deathseal9844 10 месяцев назад +1

    amazing vide. Now i know where i am wrong. thx for the video.

  • @sidtribhuvan6607
    @sidtribhuvan6607 Год назад +2

    Thanks for the information

  • @khandoor7228
    @khandoor7228 Год назад +1

    excellent lesson!

  • @sravankumar1767
    @sravankumar1767 2 года назад +2

    Please do more videos scenario based on pyspark .current project using pyspark we r doing transformations in ADB , adf only FOR data movement only.

  • @DileepKumar-ko6cy
    @DileepKumar-ko6cy 2 года назад

    Good Explanation.👍

  • @shahids4583
    @shahids4583 5 месяцев назад

    @sparklingFuture
    why cant we use pivot and filter data on top of it it will be single liner right?

  • @akashbalmiki3616
    @akashbalmiki3616 2 года назад +1

    Your videos are awesome with more advance approach but pls upgrade your audio system. Its request.. 🙏

  • @sravankumar1767
    @sravankumar1767 2 года назад +2

    can you please this scenario how to Load CSV file in to JSON with Nested Hierarchy using pyspark in ADB like custid, custname, itemname,quanity this csv when we convert to nested json custid, custname, purchases { itemname : book, quantity : 2} like one customer buy multiple items

  • @rajanib9057
    @rajanib9057 10 месяцев назад

    hello...can you please confirm when you first extracted data from CSV where did you mention the column names. how did the column names generate in the show command

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  10 месяцев назад +1

      Hi Rajani, I have given the header in the input file and used the option of header to true for displaying the header in show()

    • @rajanib9057
      @rajanib9057 10 месяцев назад

      @@sravanalakshmipisupati6533 thank you so much. Also can you please post more videos on ingesting and transforming data from/to on - premises databases and other cloud storages

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  10 месяцев назад +1

      @@rajanib9057 Thankyou. For the on premise, already most of the ingestion with different file formats are covered. Please check my videos - github.com/sravanapisupati/SampleDataSet/blob/main/RUclips_videos_list

  • @sravankumar1767
    @sravankumar1767 2 года назад +1

    How to Merge Spark DataFrame - Complex type if we have two json files json 1 schema and json2 schema is differenr how can we merge using pyspark. can you please explain this scenario.

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  2 года назад +1

      Sure Sravan, I will get back to you soon for all your questions. Could you please share sample data?

    • @sravankumar1767
      @sravankumar1767 2 года назад

      @@sravanalakshmipisupati6533 ok

    • @sravanalakshmipisupati6533
      @sravanalakshmipisupati6533  2 года назад

      Hi Sravan, You can read 2 JSON files separately as 2 Dataframes and then join them. If this is not what you are looking for, then please give me the detailed problem statement with some sample data. Thanks.