10 PySpark Product Based Interview Questions

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024

Комментарии • 23

  • @abhishekn786
    @abhishekn786 3 месяца назад +5

    Thanks for the video, Please continue this pyspark interview videos.
    Thanks Again.

  • @Sandeep-bl9ji
    @Sandeep-bl9ji 9 месяцев назад +1

    Really a nice explanation with a clear shot... Thanks a lot... Please keep more videos on pyspark

  • @Vanmathi-e5o
    @Vanmathi-e5o 2 месяца назад +2

    Only one video on the Pyspark Playlist ...
    Pls post more!!

  • @vishnuvardhan9082
    @vishnuvardhan9082 9 месяцев назад +1

    Loving your channel more day by day

  • @BigData-fu6jd
    @BigData-fu6jd 26 дней назад

    Really helpful.....Looking for more videos :)

    • @thedatatech
      @thedatatech  25 дней назад +1

      Glad to hear that

    • @BigData-fu6jd
      @BigData-fu6jd 25 дней назад

      @@thedatatech Please upload more videos and topic related videos in the description

  • @ashishlimaye2408
    @ashishlimaye2408 7 месяцев назад

    Great questions!

  • @amit4rou
    @amit4rou 2 месяца назад +2

    Doubt in 1st question:
    The delimiters in the data are "," "\t" "|"
    Then why did you use ",|\t|\|"
    Please explain.

    • @divit00
      @divit00 2 месяца назад +1

      split takes a regex pattern ",|\t|\|" means , OR \t OR |

  • @Ameem-rw4ir
    @Ameem-rw4ir 5 месяцев назад

    Bro, thanks for your effort for sharing interview based real time questions and answers. can you please share realtime streaming (kafka and pyspark) based interview based questions and answers??.

  • @akhilchandaka4053
    @akhilchandaka4053 Месяц назад

    Hi Bro
    You always teach Amazing stuff
    Great work 😊
    I have request,
    Can you please give syllabus or kind of preparation strategy for spark preparation.

  • @sudarshanthota3369
    @sudarshanthota3369 3 месяца назад

    awesome video

  • @boreddykesavareddy669
    @boreddykesavareddy669 5 месяцев назад

    Instead of left anti we can use except

  • @businessskills98
    @businessskills98 9 месяцев назад +1

    New subscriber added

  • @chandrarahul1990
    @chandrarahul1990 2 месяца назад

    i feel we can solve the 7th question using window function row_number() as well

  • @saravanakumar-r1s
    @saravanakumar-r1s 9 месяцев назад

    in interview if we solve the problems in SQL using sparksql will it be okay?

    • @Rakesh-q7m8r
      @Rakesh-q7m8r 8 месяцев назад

      It depends, sometimes, interviewer specially asks you not to use the sql and rather use the dataframe apis.

    • @selva30989
      @selva30989 6 месяцев назад

      Few guys will ask you to share code in both, spark sql and pyspark
      This way they can assess your pyspark and sql knowledge in single scenario based questions

  • @hyderali-wl3yi
    @hyderali-wl3yi 5 месяцев назад

    bro, thanks for your inputs. below data is in a file. can you please help me how to handle this?. I got bit trouble on your one line string data with what i have in multiple rows with multiple delimiter.
    empid,fname|lname@sal#deptid
    1,mohan|kumar@5000#100
    2,karna|varadan@3489#101
    3,kavitha|gandan@6000#102
    Expected output
    empid,fname,lname,sal,deptid
    1,mohan,kumar,5000,100
    2,karan,varadan,3489,101
    3,kavitha,gandan,6000,102

  • @hyderali-wl3yi
    @hyderali-wl3yi 5 месяцев назад +1

    bro, thanks for your inputs. below data is a file.can you please help me how to handle this in pyspark?
    empid,fname|lname@sal#deptid
    1,mohan|kumar@5000#100
    2,karna|varadan@3489#101
    3,kavitha|gandan@6000#102
    Expected output
    empid,fname,lname,sal,deptid
    1,mohan,kumar,5000,100
    2,karan,varadan,3489,101
    3,kavitha,gandan,6000,102

    • @piyushramkar9404
      @piyushramkar9404 4 месяца назад

      from pyspark.sql import SparkSession
      from pyspark.sql.functions import split, col
      spark = SparkSession.builder.appName("MyApp").getOrCreate()
      df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").option("delimiter", ",").load("D:/data/employees.csv")
      exp_op = df.withColumn("fname", split(col("fname|lname@sal#deptid"), "\\|").getItem(0)) \
      .withColumn("lname", split(split(col("fname|lname@sal#deptid"), "\\|").getItem(1),"@").getItem(0)) \
      .withColumn("sal", split(split(col("fname|lname@sal#deptid"), "@").getItem(1), "#").getItem(0)) \
      .withColumn("deptid", split(col("fname|lname@sal#deptid"), "#").getItem(1)) \
      .select("empid","fname", "lname", "sal", "deptid")
      exp_op.show()
      #ouput
      +-----+-------+-------+----+------+
      |empid| fname| lname| sal|deptid|
      +-----+-------+-------+----+------+
      | 1| mohan| kumar|5000| 100|
      | 2| karna|varadan|3489| 101|
      | 3|kavitha| gandan|6000| 102|
      +-----+-------+-------+----+------+