Tiger Analytics PySpark Interview Question | Very Important Question of PySpark |

Поделиться
HTML-код
  • Опубликовано: 6 янв 2025

Комментарии • 38

  • @sheikirfan2652
    @sheikirfan2652 11 месяцев назад +6

    All videos in this pyspark interview playlist are highly useful Sagar. Big Thanks for your efforts man!!

  • @abhigyapranshu4791
    @abhigyapranshu4791 10 месяцев назад +9

    Use pivot funtion with Subject column to get a new column for each value in that particular column. Can use aggregate function sum on Marks. Order of Eng/Math column may not be same.
    df1=df.groupBy("Name").pivot("sub").agg(sum("Marks"))

  • @AbhisekPatra7
    @AbhisekPatra7 Месяц назад +2

    Using case when is better as using collect list may give incorrect output. Considering null or absence of one subject mark for a person. Even in production queries case when would be used in such scenarios. 🙂

  • @pratik5648
    @pratik5648 11 месяцев назад +2

    Hi Sagar , to master pyspark which of your's course should i buy?

  • @Dataengineeringlearninghub
    @Dataengineeringlearninghub 11 месяцев назад +1

    very usefull Sagar

  • @Pratik_Tortikar
    @Pratik_Tortikar 8 месяцев назад +1

    was this asked in Tiger analytics (Canada)?

  • @abhishekpathak7338
    @abhishekpathak7338 11 месяцев назад +17

    pivoted_df = df.groupBy("Name").pivot("Subject").agg({"Marks": "first"}).show()

    • @saqlainhussain007
      @saqlainhussain007 10 месяцев назад

      This is the better answer

    • @jhonsen9842
      @jhonsen9842 10 месяцев назад +3

      I don't know why he didn't use pivot function and made solution complicated. LoL

    • @prachideokar7639
      @prachideokar7639 10 месяцев назад +2

      Last me {"marks":"first"} ka kya meaning hai

    • @tandaibhanukiran
      @tandaibhanukiran 9 месяцев назад

      @@prachideokar7639 , FIRST - Aggregated values which picks first value in orderBy for the grouped values.
      In the above case,
      GroupBy value- Name (Rudra)
      Pivot Value - (Math)
      marks: first combination of above two values is {79}.

  • @surenderraja1304
    @surenderraja1304 7 месяцев назад

    I tried below df = df.groupBy(F.col("Name")).pivot(F.col("sub")).agg(F.max(F.col("Marks")))
    df.show() , but throw error as jgd = self._jgd.pivot(pivot_col) Column is not iterable

  • @throughmyglasses9241
    @throughmyglasses9241 11 месяцев назад +3

    My Solution :
    df.withColumn("math",when(df.Subject=="math",df.Marks).otherwise(0))\
    .withColumn("eng",when(df.Subject=="eng",df.Marks).otherwise(0))\
    .groupBy("Name").agg(max("math").alias("math"),max("eng").alias("eng")).show()

  • @venkatsubbaiah4691
    @venkatsubbaiah4691 10 месяцев назад +2

    df1=df.groupBy("Name").pivot("sub").agg({"Marks" : "last"})
    df1.show()
    This code will give you irrspective how many subject you have in Sub col umn as different columns

  • @rawat7203
    @rawat7203 11 месяцев назад +2

    Hi Sir
    My Way:
    df1 = df.groupBy("Name").pivot("Sub").agg(first(col("Marks")))
    df2 = df1.select("Name", "math", "eng").orderBy(col('math').desc(),col('eng').desc())
    df2.show()

  • @2412_Sujoy_Das
    @2412_Sujoy_Das 11 месяцев назад +2

    Sagar, I had a query.... For using collect_list command, we have to sort the dataset based on subject first, right?
    My Solution:
    df_1 = spark.createDataFrame(data=data,schema=["Name","Sub","Marks"])
    df_2 = df_1.groupBy(col("Name")).pivot("Sub",["math","eng"]).agg(sum("Marks"))
    or,
    df_1.createOrReplaceTempView("Pivot_Data")
    display(spark.sql("Select Name, SUM(CASE WHEN sub like 'math' THEN Marks ELSE 0 END) as Math, SUM(CASE WHEN sub like 'eng' THEN Marks ELSE 0 END) as Eng from Pivot_Data GROUP BY Name"))

    • @GeekCoders
      @GeekCoders  11 месяцев назад

      You are right, collect list will give random result

  • @binareshit
    @binareshit 10 месяцев назад

    please english language azure datbricks
    required plese

  • @NehaAgarwal-l8l
    @NehaAgarwal-l8l 8 месяцев назад +1

    df.groupBy("Name").agg(max(when(df.Sub=='math',df.Marks).otherwise(0)).alias("Math"),max(when(df.Sub=='eng',df.Marks).otherwise(0)).alias("eng"))

  • @syedadnan4910
    @syedadnan4910 11 месяцев назад +4

    df.groupBy(df.Name).pivot(df.sub).agg(max(df.marks)).show()
    Or
    df.groupBy(df.Name).pivot(df.sub).agg(first(df.marks)).show()

    • @vasisultan8896
      @vasisultan8896 11 месяцев назад

      It is giving me the " Column is not iterable" error. can you please suggest on it?

    • @ashutoshsharma7119
      @ashutoshsharma7119 11 месяцев назад

      pivot function supports only string values. Try using string notation for pivot instead of dot notation.@@vasisultan8896

    • @syedadnan4910
      @syedadnan4910 11 месяцев назад

      df.groupBy("Name").pivot("sub").agg(max("marks"))
      or
      df.groupBy("Name").pivot("sub").agg(first("marks"))
      try this one@@vasisultan8896

    • @prachideokar7639
      @prachideokar7639 10 месяцев назад

      Max ("marks") kyu??

  • @arshadnehal246
    @arshadnehal246 3 месяца назад

    My solution
    result = df.groupBy('Name').agg(collect_list(col('sub')).alias('subs'),collect_list(col('Marks')).alias('marks'))
    col_name = [result.select('subs').first()[0][0],result.select('subs').first()[0][1]]
    result_df = result.withColumn(col_name[0],col('marks')[0]) \
    .withColumn(col_name[1],col('marks')[1]).select('Name','Math','Eng')
    result_df.show()

  • @amanmaheshwari34
    @amanmaheshwari34 6 месяцев назад +1

    df.groupby(col("Name")).agg(
    sum(when(col("Sub")=="math",col("Marks")).otherwise(0)).alias("maths"),
    sum(when(col("Sub")=="eng",col("Marks")).otherwise(0)).alias("eng")
    ).show()

  • @KRPB88
    @KRPB88 11 месяцев назад +1

    df.groupBy("Name").pivot("Sub").sum("Marks")

  • @ayushmangal4417
    @ayushmangal4417 8 месяцев назад

    df.groupBy('name').pivot('Sub', ['math','eng']).sum('Marks').display()

  • @kunalshinkar3367
    @kunalshinkar3367 10 месяцев назад

    df_sub1 = df_sub.groupBy('Name').agg(collect_list('Marks').alias('Sub_Marks'))
    df_sub1.withColumn('math',df_sub1.Sub_Marks[0]).withColumn('eng',df_sub1.Sub_Marks[1]).select('Name','math','eng').show()

  • @balaa2670
    @balaa2670 11 месяцев назад

    df.groupBy(f.col("Name")).pivot("Sub",[i[0] for i in df.select("Sub").distinct().collect()]).agg(f.sum("Marks"))

    • @JaijoJohn
      @JaijoJohn 4 месяца назад

      Good Solution

  • @biramdevpawar9902
    @biramdevpawar9902 11 месяцев назад

    df.groupBy("Name").pivot("Sub").agg(first("Marks")).orderBy("Name").show()

  • @amanmaheshwari34
    @amanmaheshwari34 6 месяцев назад

    df.groupBy("Name").pivot("Sub").sum("Marks").show()