Total partitions to process 10 GB data in Apache Spark | Interview Question|

Querying 100 Billion Rows using SQL, 7 TB in a single table

10 recently asked Pyspark Interview Questions | Big Data Interview

Timothée Chalamet | This Past Weekend w/ Theo Von #551

Exposing the Blox Fruits Dragon Update

Gas Fruit Is The MOST OVERPOWERED Fruit.. (Blox Fruits)

Tiger Analytics PySpark Interview Question | Very Important Question of PySpark |

GeekCoders

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 янв 2025

Комментарии • 38

@sheikirfan2652 11 месяцев назад ⁺⁶
All videos in this pyspark interview playlist are highly useful Sagar. Big Thanks for your efforts man!!
@abhigyapranshu4791 10 месяцев назад ⁺⁹
Use pivot funtion with Subject column to get a new column for each value in that particular column. Can use aggregate function sum on Marks. Order of Eng/Math column may not be same.
df1=df.groupBy("Name").pivot("sub").agg(sum("Marks"))
@AbhisekPatra7 Месяц назад ⁺²
Using case when is better as using collect list may give incorrect output. Considering null or absence of one subject mark for a person. Even in production queries case when would be used in such scenarios. 🙂
@GeekCoders Месяц назад ⁺¹
Yeah could be
@pratik5648 11 месяцев назад ⁺²
Hi Sagar , to master pyspark which of your's course should i buy?
@Dataengineeringlearninghub 11 месяцев назад ⁺¹
very usefull Sagar
@Pratik_Tortikar 8 месяцев назад ⁺¹
was this asked in Tiger analytics (Canada)?
@abhishekpathak7338 11 месяцев назад ⁺¹⁷
pivoted_df = df.groupBy("Name").pivot("Subject").agg({"Marks": "first"}).show()
@saqlainhussain007 10 месяцев назад
This is the better answer
@jhonsen9842 10 месяцев назад ⁺³
I don't know why he didn't use pivot function and made solution complicated. LoL
@prachideokar7639 10 месяцев назад ⁺²
Last me {"marks":"first"} ka kya meaning hai
@tandaibhanukiran 9 месяцев назад
@@prachideokar7639 , FIRST - Aggregated values which picks first value in orderBy for the grouped values.
In the above case,
GroupBy value- Name (Rudra)
Pivot Value - (Math)
marks: first combination of above two values is {79}.
@surenderraja1304 7 месяцев назад
I tried below df = df.groupBy(F.col("Name")).pivot(F.col("sub")).agg(F.max(F.col("Marks")))
df.show() , but throw error as jgd = self._jgd.pivot(pivot_col) Column is not iterable
@throughmyglasses9241 11 месяцев назад ⁺³
My Solution :
df.withColumn("math",when(df.Subject=="math",df.Marks).otherwise(0))\
.withColumn("eng",when(df.Subject=="eng",df.Marks).otherwise(0))\
.groupBy("Name").agg(max("math").alias("math"),max("eng").alias("eng")).show()
@venkatsubbaiah4691 10 месяцев назад ⁺²
df1=df.groupBy("Name").pivot("sub").agg({"Marks" : "last"})
df1.show()
This code will give you irrspective how many subject you have in Sub col umn as different columns
@rawat7203 11 месяцев назад ⁺²
Hi Sir
My Way:
df1 = df.groupBy("Name").pivot("Sub").agg(first(col("Marks")))
df2 = df1.select("Name", "math", "eng").orderBy(col('math').desc(),col('eng').desc())
df2.show()
@2412_Sujoy_Das 11 месяцев назад ⁺²
Sagar, I had a query.... For using collect_list command, we have to sort the dataset based on subject first, right?
My Solution:
df_1 = spark.createDataFrame(data=data,schema=["Name","Sub","Marks"])
df_2 = df_1.groupBy(col("Name")).pivot("Sub",["math","eng"]).agg(sum("Marks"))
or,
df_1.createOrReplaceTempView("Pivot_Data")
display(spark.sql("Select Name, SUM(CASE WHEN sub like 'math' THEN Marks ELSE 0 END) as Math, SUM(CASE WHEN sub like 'eng' THEN Marks ELSE 0 END) as Eng from Pivot_Data GROUP BY Name"))
@GeekCoders 11 месяцев назад
You are right, collect list will give random result
@binareshit 10 месяцев назад
please english language azure datbricks
required plese
@NehaAgarwal-l8l 8 месяцев назад ⁺¹
df.groupBy("Name").agg(max(when(df.Sub=='math',df.Marks).otherwise(0)).alias("Math"),max(when(df.Sub=='eng',df.Marks).otherwise(0)).alias("eng"))
@syedadnan4910 11 месяцев назад ⁺⁴
df.groupBy(df.Name).pivot(df.sub).agg(max(df.marks)).show()
Or
df.groupBy(df.Name).pivot(df.sub).agg(first(df.marks)).show()
@vasisultan8896 11 месяцев назад
It is giving me the " Column is not iterable" error. can you please suggest on it?
@ashutoshsharma7119 11 месяцев назад
pivot function supports only string values. Try using string notation for pivot instead of dot notation.@@vasisultan8896
@syedadnan4910 11 месяцев назад
df.groupBy("Name").pivot("sub").agg(max("marks"))
or
df.groupBy("Name").pivot("sub").agg(first("marks"))
try this one@@vasisultan8896
@prachideokar7639 10 месяцев назад
Max ("marks") kyu??
@arshadnehal246 3 месяца назад
My solution
result = df.groupBy('Name').agg(collect_list(col('sub')).alias('subs'),collect_list(col('Marks')).alias('marks'))
col_name = [result.select('subs').first()[0][0],result.select('subs').first()[0][1]]
result_df = result.withColumn(col_name[0],col('marks')[0]) \
.withColumn(col_name[1],col('marks')[1]).select('Name','Math','Eng')
result_df.show()
@amanmaheshwari34 6 месяцев назад ⁺¹
df.groupby(col("Name")).agg(
sum(when(col("Sub")=="math",col("Marks")).otherwise(0)).alias("maths"),
sum(when(col("Sub")=="eng",col("Marks")).otherwise(0)).alias("eng")
).show()
@GeekCoders 6 месяцев назад ⁺¹
Good approach
@KRPB88 11 месяцев назад ⁺¹
df.groupBy("Name").pivot("Sub").sum("Marks")
@ayushmangal4417 8 месяцев назад
df.groupBy('name').pivot('Sub', ['math','eng']).sum('Marks').display()
@kunalshinkar3367 10 месяцев назад
df_sub1 = df_sub.groupBy('Name').agg(collect_list('Marks').alias('Sub_Marks'))
df_sub1.withColumn('math',df_sub1.Sub_Marks[0]).withColumn('eng',df_sub1.Sub_Marks[1]).select('Name','math','eng').show()
@balaa2670 11 месяцев назад
df.groupBy(f.col("Name")).pivot("Sub",[i[0] for i in df.select("Sub").distinct().collect()]).agg(f.sum("Marks"))
@JaijoJohn 4 месяца назад
Good Solution
@biramdevpawar9902 11 месяцев назад
df.groupBy("Name").pivot("Sub").agg(first("Marks")).orderBy("Name").show()
@amanmaheshwari34 6 месяцев назад
df.groupBy("Name").pivot("Sub").sum("Marks").show()

Следующие

Автовоспроизведение

Total partitions to process 10 GB data in Apache Spark | Interview Question|

Total partitions to process 10 GB data in Apache Spark | Interview Question|

Querying 100 Billion Rows using SQL, 7 TB in a single table

Querying 100 Billion Rows using SQL, 7 TB in a single table

10 recently asked Pyspark Interview Questions | Big Data Interview

10 recently asked Pyspark Interview Questions | Big Data Interview

Timothée Chalamet | This Past Weekend w/ Theo Von #551

Timothée Chalamet | This Past Weekend w/ Theo Von #551

Exposing the Blox Fruits Dragon Update

Exposing the Blox Fruits Dragon Update

Gas Fruit Is The MOST OVERPOWERED Fruit.. (Blox Fruits)

Gas Fruit Is The MOST OVERPOWERED Fruit.. (Blox Fruits)

🔴 BLOX FRUITS DRAGON UPDATE OFFICIAL COUNTDOWN!

🔴 BLOX FRUITS DRAGON UPDATE OFFICIAL COUNTDOWN!

Tiger Analytics Interview Question | Find out all the DataFrames in Spark Session |

Tiger Analytics Interview Question | Find out all the DataFrames in Spark Session |

Tiger Analytics PySpark Interview Question | Data Engineering Course |

Tiger Analytics PySpark Interview Question | Data Engineering Course |

Most Important Question of PySpark in LTIMindTree Interview Question | Salary in each department |

Most Important Question of PySpark in LTIMindTree Interview Question | Salary in each department |

Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024

Latest Tiger Analytics coding Interview Questions & Answers | Data Engineer Prep 2024

Python Interview Question asked in Tiger Analytics

Python Interview Question asked in Tiger Analytics

SQL Interview Question Asked in Tredence Analytics

SQL Interview Question Asked in Tredence Analytics

Tiger Analytics SQL Interview Question for Data Engineering Position

Tiger Analytics SQL Interview Question for Data Engineering Position

Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

Capgemini Data Engineer Interview Question - Round 1 | Save Multiple Columns in the DataFrame |

O Café Gelado de Nutella ☕🧊

O Café Gelado de Nutella ☕🧊

Как живут последние жители деревень. Полное видео на ютуб канале Русские тайны!

Как живут последние жители деревень. Полное видео на ютуб канале Русские тайны!

ШЕДЕВР НА СТРИМЕ! - ПОТОК ШИМОРО - ВЫЖИВАНИЕ НА ТАНКЕ В TANKHEAD

ШЕДЕВР НА СТРИМЕ! - ПОТОК ШИМОРО - ВЫЖИВАНИЕ НА ТАНКЕ В TANKHEAD

Surfing on 3.5 Million BBs!

Surfing on 3.5 Million BBs!

БАТЯ И НОВОГОДНИЕ ПРАЗДНИКИ😂#shorts

БАТЯ И НОВОГОДНИЕ ПРАЗДНИКИ😂#shorts

Праздничная закуска из помидоров, моцареллы и руколы с соусом из тунца

Праздничная закуска из помидоров, моцареллы и руколы с соусом из тунца

Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom | Lex Fridman Podcast #456

Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom | Lex Fridman Podcast #456