PySpark Interview Questions (2025) | PySpark Real Time Scenarios

Ansh Lamba

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 31 янв 2025

Комментарии • 167

@AnshLambaJSR 13 дней назад ⁺²²
Now YOU can practice PySpark Coding questions without paying SO MUCH to the paid platforms. I have spent a lot of time to create this SOLUTION so that you don't need to PAY to anyone. Looking for your SUPPORT ❤
@shyammaths5705 11 дней назад
really really love you bhai the amount of hard work you are putting to teach, its forcing us to study hard
@anuragpandey5369 12 дней назад ⁺⁵
"You have a gift for explaining difficult topics with clarity. Your videos have saved countless hours of frustration!"
Thanks bro....
@SatyamKumarJha-d2e 11 дней назад ⁺¹
stop using gpt for commenting. that quoted one is 100% gpt.
@anuragpandey5369 11 дней назад
@SatyamKumarJha-d2e it's means you also searching same things from Chatgpt....😂
@anuragpandey5369 11 дней назад ⁺¹
@SatyamKumarJha-d2e It means you also using GPT right ??...So please add nice comments to the chat instead of antics like NARAZ FUFA...😂
@AnshLambaJSR 10 дней назад
Thank you for your kind words:) Happy Learning
@navyarangu6916 День назад
I have seen alot of pyspark videos but your teaching is the best.
@lanreuzamere4994 12 дней назад ⁺³
This came in just in time! Blessings bro
@AnshLambaJSR 10 дней назад
Glad this video helped you :)
@sairam8406 11 дней назад ⁺³
No words for your hard work bro, Thank you so much for bringing us this knowledge.
@AnshLambaJSR 10 дней назад
Glad you liked it :)
@pavankumar12121 12 дней назад ⁺¹
UR content is going crazy and for DE i can say ur channel is one stop solution
@AnshLambaJSR 10 дней назад
Indeed!
@MrSonu1505 12 дней назад ⁺¹
Thank you Ansh Bro. I was waiting for this video for a long time.
@AnshLambaJSR 12 дней назад
I am happy that you loved this video :)
@aman_nv 8 дней назад
Really amazing content, well explained. I'm transitioning to data engineering, so i have been learning pyspark for interview.
i believe this is all you need especially for entry level DE job roles, thanks so much Ansh!!!
@AnshLambaJSR 5 дней назад
Thank you for your kind words :)
@rahulborate7034 12 дней назад ⁺¹
Hi Ansh,
I am a big fan of you as you are really doing good things to get proper knowledge and helping guys to save there money.
Keep it up 👍
@AnshLambaJSR 10 дней назад
Happy you found my videos useful :)
@somanathking4694 12 дней назад ⁺¹
Thanks buddy, I am learning new frameworks from your tutorials ❤ it is very helpful 🙏🏾
@AnshLambaJSR 10 дней назад ⁺¹
Glad to hear that. Happy Learning :)
@SaiDheerajKanaparthi 3 дня назад
For question10 ,
its asking to remove duplicates while preserving the order. while row_number function does the job, its not maintaining the order.
instead
df_deduplicated = df.dropDuplicates()
df_deduplicated.display()
maintains the order.
Love your videos.
love your Aura
keep making more videos.
@sulthanmohiddin5294 12 дней назад ⁺¹
Thanks a lot Ansh ..this is a masterpiece for the Pyspark interview
@AnshLambaJSR 12 дней назад
Thank you for your kind words :)
@gouthambheema1779 12 дней назад ⁺¹
Awesome man… We appreciate your efforts❤❤
@AnshLambaJSR 12 дней назад ⁺¹
Glad to hear this:) Happy Learning!
@swarnavadutta5266 5 дней назад ⁺¹
Timestamps (Powered by Merlin AI)
00:06 - Prepare for 2025 PySpark interviews with comprehensive resources.
02:50 - Introduction to a specialized PySpark interview notebook for real-time scenarios.
07:34 - Preparing PySpark interview resources to boost confidence.
09:52 - Creating a supportive Telegram community for resource sharing and error debugging.
14:27 - Creating a free Databricks Community Edition account.
16:40 - Guide on accessing Databricks for PySpark interview preparation.
20:21 - Understanding duplicates in PySpark DataFrames is crucial for interviews.
22:14 - Effective interview strategies for handling coding questions.
26:07 - Understanding data engineer responsibilities and data handling techniques is crucial.
28:15 - Career preparation: Watch recommended video for complete interview readiness.
32:16 - Converting date columns in PySpark DataFrames efficiently.
34:32 - Sorting and deduplicating data in PySpark efficiently.
38:22 - Understanding schema merging is crucial for handling inconsistent data in PySpark.
40:34 - PySpark optimizes data processing by reducing disk writes using in-memory computations.
45:01 - Handling null values in PySpark DataFrames using fillna.
47:07 - Calculate and retrieve top five users by total actions.
51:06 - Using window functions to analyze recent customer transactions
53:18 - Demonstrating filtering and sorting of DataFrame in PySpark.
57:37 - Identifying customers without purchases in 30 days using PySpark.
59:50 - Calculate users with purchase gaps over 30 days using PySpark.
1:03:59 - Transforming a text column into an array and exploding its values.
1:06:19 - Grouping and counting words in PySpark with aliasing techniques.
1:10:44 - Calculate cumulative sales sum over time using PySpark.
1:12:49 - Using cumulative sum with window functions in PySpark.
1:17:18 - Using row number to retain order while removing duplicates.
1:19:32 - Surrogate keys simplify database design by replacing complex primary keys.
1:23:41 - Focus on covering all key topics, including data aggregation in PySpark.
1:25:54 - Double aggregation to find top-selling products per month.
1:30:33 - Extract highest sales product per month using dense rank.
1:33:17 - Debugging PySpark code focuses on correcting syntax and indentation issues.
1:37:40 - Understanding Spark architecture and components in PySpark.
1:39:54 - Understanding Spark architecture and driver node functionality.
1:44:25 - Applying merge conditions using Delta tables in PySpark.
1:46:53 - Understanding upsert and schema inference in PySpark.
1:50:54 - Understanding RDDs, DataFrames, and Datasets in PySpark.
1:52:53 - DataFrames enhance usability and integration in PySpark compared to RDDs.
1:57:13 - Understanding logical and physical plans in PySpark for optimization.
1:59:29 - Spark chooses the optimal join type based on cost models.
2:03:51 - Understanding Spark's entry points and transformation types.
2:05:59 - Narrow transformations allow independent data processing without shuffling between machines.
2:10:27 - Understanding partitioning and data management in PySpark.
2:12:31 - Understanding data storage and management in PySpark using cache and persist.
2:17:11 - Partitions in PySpark enable massive parallel processing for efficient data handling.
2:19:17 - Counting employees by department and classifying sales transactions.
2:24:03 - Understanding fundamental concepts is essential for implementing solutions in data processing.
2:26:14 - Using current timestamp for tracking record changes in PySpark.
2:30:36 - Understanding temporary and global views in PySpark SQL.
2:32:59 - Understanding Global Temp views and flattening nested structures in PySpark.
2:37:37 - Understanding data partitioning in PySpark for optimized storage.
2:39:45 - Using Snappy compression enhances performance with Parquet files.
2:43:58 - Understanding the optimize and Z order by commands in PySpark.
2:46:05 - Understanding Z-Order and Data Skipping in Spark for Efficient Querying.
2:50:20 - Understanding DataFrame actions and lazy evaluation in PySpark.
2:52:40 - Understanding actions and lazy evaluation in PySpark.
2:57:22 - Key advantages of Delta Lake and memory management in PySpark jobs.
2:59:48 - Optimizing memory and mitigating data skew with AQE in PySpark.
3:04:31 - Dynamic join optimization in PySpark enhances performance.
3:06:29 - Discussing skew data handling methods in PySpark using salting and AQE.
3:10:56 - Understanding Spark's memory management and Delta Lake's time travel features.
3:13:18 - Utilize Delta Lake's time travel feature to recover deleted data.
3:17:49 - Using collect_list for aggregating product names by category in PySpark.
3:20:38 - Using collect_set to find unique product IDs per customer.
3:25:49 - Using concat_ws for efficient string concatenation in PySpark.
3:28:23 - Calculate product counts for customers using PySpark DataFrame operations.
3:33:02 - Validating phone numbers using prefix filtering in PySpark.
3:35:25 - Calculate average courses per student from a dataset.
3:40:28 - Discussion on preparing for PySpark interviews with valuable resources.
3:43:01 - Determine if postal codes are standard or custom based on length.
@DevendraKumar-ch6kw 12 дней назад ⁺¹
Superb work bro .... Great energy 🎉
@AnshLambaJSR 10 дней назад
Glad you liked it:) Happy Learning
@vu8354 12 дней назад ⁺²
Your Data Fam is here
@AnshLambaJSR 12 дней назад
Glad to see the excitement
@seetharam945 12 дней назад ⁺²
Thanks Ansh bhai
@AnshLambaJSR 12 дней назад
Happy that you liked it:)
@hariprasad3820 12 дней назад ⁺¹
Thank you Ansh, currently studying the pyspark course video.. once it is completed this will be very much useful ❤
@AnshLambaJSR 10 дней назад ⁺¹
Great that you are watching my other course as well! Happy learning :)
@paniarabinda 12 дней назад ⁺²
You are awesome @Ansh Lamba
@AnshLambaJSR 10 дней назад
Happy Learning :)
@gopichand5717 12 дней назад ⁺²
Thank you ansh
Thanks for sharing
Great content - this will help😊
@AnshLambaJSR 10 дней назад
Happy that you found it useful :)
@generations_together 8 дней назад
Love you bro❤ love the way of your explanation..very informative
@AnshLambaJSR 5 дней назад
I'm happy that you loved this video :)
@soundaryakv668 5 дней назад
Hey Ansh, I can’t thank you enough for this wonderful Masterpiece. Your explanation is always clear and really made a difference in how I understood the material. You have a great teaching style that makes even the most complicated topics seem accessible. I’ve learned so much from you, and I’m really grateful for the time and effort you put into helping us improve our skills!♥👏👏👏
@ganeshawarde8037 12 дней назад ⁺¹
Waiting for this❤
@AnshLambaJSR 10 дней назад
Happy Learning :)
@ronakpatil9402 12 дней назад ⁺¹
Fabulous Ansh❤
@AnshLambaJSR 10 дней назад
Happy Learning :)
@ManvithaTangella 5 дней назад
this is so helpful! but im unable to access the notebook.. could you please check?
@AnshLambaJSR 5 дней назад
It's there github.com/anshlambagit/PySparkInterview
@adiravikumarkorada3454 3 дня назад
You are awsome bro...
Great
@Vinay-m9u3r 7 дней назад
Covered all the things bro this video is enough for preparation. Thanks a Ton.....!!!!!👌👌👏👏👏🙌🙌🙌🙌
@AnshLambaJSR 5 дней назад
Happy to hear that :)
@rashmigowda3528 6 дней назад
awesome explanation!!!! and really helpful content. Kindly make such video on SCALA as well ..
@AnshLambaJSR 6 дней назад
Stay tuned :)
@PrashantKumar-wz3ex 12 дней назад ⁺¹
Thanks a lot
@dnyanobanavale5172 11 дней назад ⁺¹
awesome Bro 🙏🙏
@AnshLambaJSR 10 дней назад
Happy Learning :)
@rajasreejanyavula 8 дней назад ⁺¹
Hello Ansh, I am just in love with your content, bro. you made my life easier thank you so much, I am happy I found your channel. I have a request for you: what kind of SQL questions will be asked for the Azure data engineer role. If possible, can you make a video of it?
@AnshLambaJSR 5 дней назад ⁺¹
Thank you for your kind words :) Stay tuned
@WaseemPashaZ 12 дней назад
Learning Growing thanks ❤
@AnshLambaJSR 10 дней назад
Happy Learning :)
@AmitKumar-gw1he 12 дней назад ⁺¹
Thanks a lot Ansh ❤
@AnshLambaJSR 10 дней назад
Happy learning:)
@Sunil999G 12 дней назад ⁺¹
Thanks brother 💖
@AnshLambaJSR 10 дней назад ⁺¹
Glad you liked it :)
@MickeyMack-qf2ey 11 дней назад ⁺¹
Informative one broooo!!!!
Can you please create videos on delta live tables?
@AnshLambaJSR 10 дней назад
Thank you. Stay Tuned :)
@Dk223-w9g 12 дней назад ⁺¹
Like it bro...❤❤
@AnshLambaJSR 12 дней назад
Happy you liked it:)
@krthiak 12 дней назад ⁺¹
This is brilliant
@AnshLambaJSR 12 дней назад
Thank you for your kind words :)
@RAHULKUMAR-px8em 12 дней назад ⁺¹
Guru 🔥🔥🔥🔥🔥🔥
@AnshLambaJSR 10 дней назад
Happy Learning :)
@dhirendramaurya5361 12 дней назад ⁺¹
You are awesome bro ❤ wow😮
@AnshLambaJSR 12 дней назад
Thank you for your kind words:) Happy Learning!
@rameshkandi 9 дней назад
Thank you Ansh.
@AnshLambaJSR 5 дней назад ⁺¹
Glad you liked it:)
@RadhaKrishna-i7s 12 дней назад ⁺¹
You are awesome ❤@Ansh bro
@AnshLambaJSR 12 дней назад
Happy Learning :)
@shubhambhosale8467 9 часов назад
not able to open notebook .dbc can you please change extention and send,thank you
@omkarbhosale4173 12 дней назад ⁺¹
Wow thanks ansh
@AnshLambaJSR 12 дней назад
Glad you liked it:)
@AnkitSharma-ds9mo 12 дней назад ⁺¹
Thanks Ansh
@AnshLambaJSR 12 дней назад
Happy you liked it :)
@iPhone-vk3bt 11 дней назад ⁺¹
Hey Ansh, Can we get interview questions even for Azure Data Factory. Thank you soo much for your wonderful help
@AnshLambaJSR 10 дней назад
Stay Tuned :)
@Vinay-m9u3r 7 дней назад
Thanks a lot Ansh. Can you make same kind of real time interview question for ADF as well it's so helpful.
@AnshLambaJSR 5 дней назад
Stay tuned :)
@Vinay-m9u3r 3 дня назад
@ Thank you☺
@aprajitapandey482 11 дней назад ⁺¹
Hi @AnshLambaJSR , I can't use DBC file in work laptop. Can you please provide CSV format please
@aniketghodake1366 11 дней назад
Bro, how can a notebook output be CSV ?
@aprajitapandey482 11 дней назад
@aniketghodake1366 so, which app shall I use to open it?
@throwaway-z1j 11 дней назад
Ansh bro, I struggle a lot while answering ci/cd related questions in the interviews. Can you explain how they move environments and do prod deployment in databricks. Thanks
@JoyalJoseph-h4u 6 дней назад
HI Ansh, Unable to view the Notebook in the github.Seems empty. Can you reshare it.
@knowledge4686 11 дней назад
Hii Ansh ,
At 1:09:26 there are two product rows in which 1 has , with it which has created the issue of calculating product column as 1.
How are you going to correct that issue?
@Morphyto 9 дней назад
Hey @AnshLambaJSR Love your videos. Can we also get a video on Azure interview questions??
@AnshLambaJSR 5 дней назад
Stay tuned and Happy Learning :)
@GauravSubodh 12 дней назад ⁺¹
OP bro OP........
@AnshLambaJSR 10 дней назад
Happy Learning :)
@Chantilocal_007 12 дней назад ⁺²
You know one telugu famous politician dialogue. " Nenu vinnanu nenu vunnanu", That's perfectly opt for you.
@aprajitapandey482 12 дней назад
Meaning of this telugu dialogue?
@sanketsalokhe4603 День назад
Thank you ANSH for great content...❤
Please make one for AZURE
@shahzan525 12 дней назад
Any classes you are arranging?
@Prasannabigdata 8 дней назад
kindly requesting that will you post scala complete explanation vedio in ytub🤗
@AnshLambaJSR 5 дней назад ⁺¹
Stay tuned :)
@uoops87k8j76 11 дней назад ⁺¹
also checkpointing() what does it do
@Shantha-li2po 10 дней назад ⁺¹
Can you please make a tutorial of apache airflow
@AnshLambaJSR 10 дней назад
Stay tuned :)
@Samudra1419 9 дней назад
How did you guys open this file?
its in .dbc format
edit : Steps to Import a .dbc File into Databricks:
Log in to Databricks.
Go to the Workspace tab.
Navigate to a folder or create a new folder.
Click on the "Import" button (usually at the top-right).
Choose the .dbc file (PySpark Interview.dbc) from your local machine.
Click Import.
@unknownstar8502 12 дней назад
bro can you make vdo on optimization in adf databricks and how do data quality check in bronze layer
@kaushikmishra6071 10 дней назад
ansh ,can you make adf real time scenarios questions playlist
@AnshLambaJSR 5 дней назад
Stay Tuned :)
@gudiatoka 12 дней назад
Bro.
Rather than in single video.
Break into series for this one .it will.be more helpful during preparation
@samlimvtchos6105 9 дней назад
Make a video on sql and more queries also
@AnshLambaJSR 5 дней назад
Stay tuned :)
@expertpkr4077 12 дней назад ⁺¹
Thank you❤ thank you ❤ thank you❤ thank you❤ thank you❤ thank you❤ thank you ❤ , ............ Thanks ❤ a lot for the video , its very useful ❤
@AnshLambaJSR 10 дней назад
I am happy that you loved this PySpark Interview questions video:)
@vsu1225 11 дней назад
Hi Ansh , Start with the Azure free trial, getting a phone number error like "We’re unable to validate your phone number." tried with different country phone numbers. Getting same error, please assist me.
@therawscholar 12 дней назад
bang on bro
@PracticePrecision 9 дней назад
Bro content is good ..but one issue ...no one can understand your content when in a hurry ..I mean in 2x speed
@AnshLambaJSR 5 дней назад
Don't be in hurry bro.
@SambaMitta 9 дней назад
unable to dowload the notebook
@Butterfly-oy5tr 12 дней назад
If you make one video for AIRFLOW, that would be helpful for many people
@dipakchavan4659 10 дней назад
DP 203 Expiring? And introducing new Certificate Microsoft Fabric? Is it True? ADF is no longer? it will be part of Fabric in future? everything under Fabric in Future for Data engineers? Can u please make video on this for clarity and confusion which citification's are useful in the future for Azure Data Engineers. which side we should focus most. Fabric, ADF, Synapse, Or Databricks? Plz make one video on priority basis. Thanks in Advance Ansh.
@AnshLambaJSR 10 дней назад
STay tuned and Happy Learning :)
@fahadbawazir562 12 дней назад ⁺¹
HI ANSH, GOOD START IN 2025
@AnshLambaJSR 12 дней назад
Happy you liked it :)
@prafulhsolanki 3 дня назад
for Q 10 While preparing a data pipeline, you notice some duplicate rows in a dataset.
How would you remove the duplicates without affecting the original order?
Correct answer is to use monotonically_increasing_id along with dropDuplicates to preserve the order , we can't use Window function with ROW_NUMBER
data = [("John", 25), ("Jane", 30), ("John", 25), ("Alice", 22)]
columns = ["name", "age"]
df = spark.createDataFrame(data, columns)
df.show()
df = df.withColumn("index", f.monotonically_increasing_id())
df.printSchema()
df.show()
df = df.dropDuplicates(["name", "age"]).orderBy("index").drop("index")
df.show()
@_udaalgirakio_o 12 дней назад
Bro what about machine learning with azure data engineering?
@Butterfly-oy5tr 12 дней назад ⁺²
Finally data engineer aspirants found the RUclips channel where they can learn everything 🎉
Thanks a lot✨🛐
@AnshLambaJSR 10 дней назад
Thank you so much for your kind words :)
@moyeenshaikh4378 3 дня назад
Bro sql and python questions?
@RaushanMaths 12 дней назад ⁺¹
Great content as always ❤ You are a gift for your data fam🎉
@AnshLambaJSR 10 дней назад
Thank you for your kind words :)
@Nick-du9ss 11 дней назад ⁺¹
make a interview question on a ADF
@AnshLambaJSR 10 дней назад
Stay tuned:)
@krishnagupta5239 12 дней назад ⁺¹
bro please bring end to end project on fabric
@AnshLambaJSR 10 дней назад
Stay tuned :)
@jairam00669 12 дней назад ⁺⁵
Ansh, I have a small suggestion for you. It would be great if you could upload videos on Friday nights. Since many of us are working professionals, this would allow us to dedicate time over the weekend, on Saturday and Sunday, to watch and engage with the content.
@AnshLambaJSR 12 дней назад
Thank you for your suggestion :)
@SuryaJN 12 дней назад
Exactly I was thinking to comment the same, Hi ansh this would help us so much😊
@abhay1446 12 дней назад
Brother how can we just get a data engineering job as a fresher bcz i see opening for 3+ years experience for it in MUMBAI location.
@Uda_dunga 12 дней назад ⁺²
waiting for DLT Project
@AnshLambaJSR 10 дней назад ⁺¹
Stay Tuned :)
@rishangnp3217 12 дней назад
Azure keyvault video
@lakshmiprasadpresident4462 12 дней назад
Airflow tution request
@kirankarthikeyan4940 12 дней назад ⁺¹
Great continue on the Azure DE interview series.
@AnshLambaJSR 12 дней назад ⁺¹
More to come!
@rishangnp3217 12 дней назад ⁺¹
Superb video,now a company asking azure devops and cicd integration as well
@AnshLambaJSR 12 дней назад
Stay tuned :)
@rashuagarwal8119 День назад ⁺¹
Bhai Teri sakal to itni achi h ni itni style na marke sidha sidha padhale
@rishangnp3217 12 дней назад
Azure logic app
@fahadbawazir562 12 дней назад ⁺²
also please make a video on why AZURE DATA ENGINEERING is retiring & how AZURE FABRIC is replacing AZURE DATA ENGINEERING
@krthiak 12 дней назад
I have spent 6 years using azure tools. Should I give dp 203 certif and complete it or wait, learn and prep for MS Fabric
@AnshLambaJSR 12 дней назад
Stay tuned :)
@kasimbasha9789 12 дней назад ⁺¹
First of all, thank you for your efforts! Your content is truly helpful for those who are serious about the data engineering field. I have recommended your channel to many people, including my colleagues and friends. They have subscribed and are regularly following your videos. Please continue creating more videos-we are here to support you always 😊🙏🙂
@AnshLambaJSR 10 дней назад
Glad to hear this:) Happy Learning :)
@sambitpati6129 12 дней назад ⁺¹
our SUPERHERO is back !!! 💟❣❣💟💟❣❣❤‍🔥❤‍🔥❤‍🔥❤‍🔥
@AnshLambaJSR 10 дней назад
Happy to see such excitement for my videos :) Happy Learning!
@RKu-iv7qr 8 дней назад
Worst Chapri way of talking
@AnshLambaJSR 5 дней назад
Thanks for the lovely compliment
@uoops87k8j76 11 дней назад ⁺¹
for question 1:
can we use this ,its easier right ?
df1=df.groupBy("product_id").agg(max("sales").alias("sales"), max("date").alias("date"))
df1.display()
@ILoveSQL 12 дней назад ⁺¹
Please create a video on ADF interview questions for real time scenarios and also once end to end project on adf only which covers all the activities mostly used and important scenarios. @anshLambaJSR
@AnshLambaJSR 10 дней назад ⁺¹
Stay tuned :)
@ILoveSQL 10 дней назад
@@AnshLambaJSR thanks bro

Следующие

Автовоспроизведение

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview