PySpark Interview Questions (2025) | PySpark Real Time Scenarios

Поделиться
HTML-код
  • Опубликовано: 31 янв 2025

Комментарии • 167

  • @AnshLambaJSR
    @AnshLambaJSR  13 дней назад +22

    Now YOU can practice PySpark Coding questions without paying SO MUCH to the paid platforms. I have spent a lot of time to create this SOLUTION so that you don't need to PAY to anyone. Looking for your SUPPORT ❤

    • @shyammaths5705
      @shyammaths5705 11 дней назад

      really really love you bhai the amount of hard work you are putting to teach, its forcing us to study hard

  • @anuragpandey5369
    @anuragpandey5369 12 дней назад +5

    "You have a gift for explaining difficult topics with clarity. Your videos have saved countless hours of frustration!"
    Thanks bro....

    • @SatyamKumarJha-d2e
      @SatyamKumarJha-d2e 11 дней назад +1

      stop using gpt for commenting. that quoted one is 100% gpt.

    • @anuragpandey5369
      @anuragpandey5369 11 дней назад

      @SatyamKumarJha-d2e it's means you also searching same things from Chatgpt....😂

    • @anuragpandey5369
      @anuragpandey5369 11 дней назад +1

      @SatyamKumarJha-d2e It means you also using GPT right ??...So please add nice comments to the chat instead of antics like NARAZ FUFA...😂

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Thank you for your kind words:) Happy Learning

  • @navyarangu6916
    @navyarangu6916 День назад

    I have seen alot of pyspark videos but your teaching is the best.

  • @lanreuzamere4994
    @lanreuzamere4994 12 дней назад +3

    This came in just in time! Blessings bro

  • @sairam8406
    @sairam8406 11 дней назад +3

    No words for your hard work bro, Thank you so much for bringing us this knowledge.

  • @pavankumar12121
    @pavankumar12121 12 дней назад +1

    UR content is going crazy and for DE i can say ur channel is one stop solution

  • @MrSonu1505
    @MrSonu1505 12 дней назад +1

    Thank you Ansh Bro. I was waiting for this video for a long time.

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад

      I am happy that you loved this video :)

  • @aman_nv
    @aman_nv 8 дней назад

    Really amazing content, well explained. I'm transitioning to data engineering, so i have been learning pyspark for interview.
    i believe this is all you need especially for entry level DE job roles, thanks so much Ansh!!!

    • @AnshLambaJSR
      @AnshLambaJSR  5 дней назад

      Thank you for your kind words :)

  • @rahulborate7034
    @rahulborate7034 12 дней назад +1

    Hi Ansh,
    I am a big fan of you as you are really doing good things to get proper knowledge and helping guys to save there money.
    Keep it up 👍

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Happy you found my videos useful :)

  • @somanathking4694
    @somanathking4694 12 дней назад +1

    Thanks buddy, I am learning new frameworks from your tutorials ❤ it is very helpful 🙏🏾

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад +1

      Glad to hear that. Happy Learning :)

  • @SaiDheerajKanaparthi
    @SaiDheerajKanaparthi 3 дня назад

    For question10 ,
    its asking to remove duplicates while preserving the order. while row_number function does the job, its not maintaining the order.
    instead
    df_deduplicated = df.dropDuplicates()
    df_deduplicated.display()
    maintains the order.
    Love your videos.
    love your Aura
    keep making more videos.

  • @sulthanmohiddin5294
    @sulthanmohiddin5294 12 дней назад +1

    Thanks a lot Ansh ..this is a masterpiece for the Pyspark interview

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад

      Thank you for your kind words :)

  • @gouthambheema1779
    @gouthambheema1779 12 дней назад +1

    Awesome man… We appreciate your efforts❤❤

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад +1

      Glad to hear this:) Happy Learning!

  • @swarnavadutta5266
    @swarnavadutta5266 5 дней назад +1

    Timestamps (Powered by Merlin AI)
    00:06 - Prepare for 2025 PySpark interviews with comprehensive resources.
    02:50 - Introduction to a specialized PySpark interview notebook for real-time scenarios.
    07:34 - Preparing PySpark interview resources to boost confidence.
    09:52 - Creating a supportive Telegram community for resource sharing and error debugging.
    14:27 - Creating a free Databricks Community Edition account.
    16:40 - Guide on accessing Databricks for PySpark interview preparation.
    20:21 - Understanding duplicates in PySpark DataFrames is crucial for interviews.
    22:14 - Effective interview strategies for handling coding questions.
    26:07 - Understanding data engineer responsibilities and data handling techniques is crucial.
    28:15 - Career preparation: Watch recommended video for complete interview readiness.
    32:16 - Converting date columns in PySpark DataFrames efficiently.
    34:32 - Sorting and deduplicating data in PySpark efficiently.
    38:22 - Understanding schema merging is crucial for handling inconsistent data in PySpark.
    40:34 - PySpark optimizes data processing by reducing disk writes using in-memory computations.
    45:01 - Handling null values in PySpark DataFrames using fillna.
    47:07 - Calculate and retrieve top five users by total actions.
    51:06 - Using window functions to analyze recent customer transactions
    53:18 - Demonstrating filtering and sorting of DataFrame in PySpark.
    57:37 - Identifying customers without purchases in 30 days using PySpark.
    59:50 - Calculate users with purchase gaps over 30 days using PySpark.
    1:03:59 - Transforming a text column into an array and exploding its values.
    1:06:19 - Grouping and counting words in PySpark with aliasing techniques.
    1:10:44 - Calculate cumulative sales sum over time using PySpark.
    1:12:49 - Using cumulative sum with window functions in PySpark.
    1:17:18 - Using row number to retain order while removing duplicates.
    1:19:32 - Surrogate keys simplify database design by replacing complex primary keys.
    1:23:41 - Focus on covering all key topics, including data aggregation in PySpark.
    1:25:54 - Double aggregation to find top-selling products per month.
    1:30:33 - Extract highest sales product per month using dense rank.
    1:33:17 - Debugging PySpark code focuses on correcting syntax and indentation issues.
    1:37:40 - Understanding Spark architecture and components in PySpark.
    1:39:54 - Understanding Spark architecture and driver node functionality.
    1:44:25 - Applying merge conditions using Delta tables in PySpark.
    1:46:53 - Understanding upsert and schema inference in PySpark.
    1:50:54 - Understanding RDDs, DataFrames, and Datasets in PySpark.
    1:52:53 - DataFrames enhance usability and integration in PySpark compared to RDDs.
    1:57:13 - Understanding logical and physical plans in PySpark for optimization.
    1:59:29 - Spark chooses the optimal join type based on cost models.
    2:03:51 - Understanding Spark's entry points and transformation types.
    2:05:59 - Narrow transformations allow independent data processing without shuffling between machines.
    2:10:27 - Understanding partitioning and data management in PySpark.
    2:12:31 - Understanding data storage and management in PySpark using cache and persist.
    2:17:11 - Partitions in PySpark enable massive parallel processing for efficient data handling.
    2:19:17 - Counting employees by department and classifying sales transactions.
    2:24:03 - Understanding fundamental concepts is essential for implementing solutions in data processing.
    2:26:14 - Using current timestamp for tracking record changes in PySpark.
    2:30:36 - Understanding temporary and global views in PySpark SQL.
    2:32:59 - Understanding Global Temp views and flattening nested structures in PySpark.
    2:37:37 - Understanding data partitioning in PySpark for optimized storage.
    2:39:45 - Using Snappy compression enhances performance with Parquet files.
    2:43:58 - Understanding the optimize and Z order by commands in PySpark.
    2:46:05 - Understanding Z-Order and Data Skipping in Spark for Efficient Querying.
    2:50:20 - Understanding DataFrame actions and lazy evaluation in PySpark.
    2:52:40 - Understanding actions and lazy evaluation in PySpark.
    2:57:22 - Key advantages of Delta Lake and memory management in PySpark jobs.
    2:59:48 - Optimizing memory and mitigating data skew with AQE in PySpark.
    3:04:31 - Dynamic join optimization in PySpark enhances performance.
    3:06:29 - Discussing skew data handling methods in PySpark using salting and AQE.
    3:10:56 - Understanding Spark's memory management and Delta Lake's time travel features.
    3:13:18 - Utilize Delta Lake's time travel feature to recover deleted data.
    3:17:49 - Using collect_list for aggregating product names by category in PySpark.
    3:20:38 - Using collect_set to find unique product IDs per customer.
    3:25:49 - Using concat_ws for efficient string concatenation in PySpark.
    3:28:23 - Calculate product counts for customers using PySpark DataFrame operations.
    3:33:02 - Validating phone numbers using prefix filtering in PySpark.
    3:35:25 - Calculate average courses per student from a dataset.
    3:40:28 - Discussion on preparing for PySpark interviews with valuable resources.
    3:43:01 - Determine if postal codes are standard or custom based on length.

  • @DevendraKumar-ch6kw
    @DevendraKumar-ch6kw 12 дней назад +1

    Superb work bro .... Great energy 🎉

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Glad you liked it:) Happy Learning

  • @vu8354
    @vu8354 12 дней назад +2

    Your Data Fam is here

  • @seetharam945
    @seetharam945 12 дней назад +2

    Thanks Ansh bhai

  • @hariprasad3820
    @hariprasad3820 12 дней назад +1

    Thank you Ansh, currently studying the pyspark course video.. once it is completed this will be very much useful ❤

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад +1

      Great that you are watching my other course as well! Happy learning :)

  • @paniarabinda
    @paniarabinda 12 дней назад +2

    You are awesome @Ansh Lamba

  • @gopichand5717
    @gopichand5717 12 дней назад +2

    Thank you ansh
    Thanks for sharing
    Great content - this will help😊

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Happy that you found it useful :)

  • @generations_together
    @generations_together 8 дней назад

    Love you bro❤ love the way of your explanation..very informative

    • @AnshLambaJSR
      @AnshLambaJSR  5 дней назад

      I'm happy that you loved this video :)

  • @soundaryakv668
    @soundaryakv668 5 дней назад

    Hey Ansh, I can’t thank you enough for this wonderful Masterpiece. Your explanation is always clear and really made a difference in how I understood the material. You have a great teaching style that makes even the most complicated topics seem accessible. I’ve learned so much from you, and I’m really grateful for the time and effort you put into helping us improve our skills!♥👏👏👏

  • @ganeshawarde8037
    @ganeshawarde8037 12 дней назад +1

    Waiting for this❤

  • @ronakpatil9402
    @ronakpatil9402 12 дней назад +1

    Fabulous Ansh❤

  • @ManvithaTangella
    @ManvithaTangella 5 дней назад

    this is so helpful! but im unable to access the notebook.. could you please check?

    • @AnshLambaJSR
      @AnshLambaJSR  5 дней назад

      It's there github.com/anshlambagit/PySparkInterview

  • @adiravikumarkorada3454
    @adiravikumarkorada3454 3 дня назад

    You are awsome bro...
    Great

  • @Vinay-m9u3r
    @Vinay-m9u3r 7 дней назад

    Covered all the things bro this video is enough for preparation. Thanks a Ton.....!!!!!👌👌👏👏👏🙌🙌🙌🙌

  • @rashmigowda3528
    @rashmigowda3528 6 дней назад

    awesome explanation!!!! and really helpful content. Kindly make such video on SCALA as well ..

  • @PrashantKumar-wz3ex
    @PrashantKumar-wz3ex 12 дней назад +1

    Thanks a lot

  • @dnyanobanavale5172
    @dnyanobanavale5172 11 дней назад +1

    awesome Bro 🙏🙏

  • @rajasreejanyavula
    @rajasreejanyavula 8 дней назад +1

    Hello Ansh, I am just in love with your content, bro. you made my life easier thank you so much, I am happy I found your channel. I have a request for you: what kind of SQL questions will be asked for the Azure data engineer role. If possible, can you make a video of it?

    • @AnshLambaJSR
      @AnshLambaJSR  5 дней назад +1

      Thank you for your kind words :) Stay tuned

  • @WaseemPashaZ
    @WaseemPashaZ 12 дней назад

    Learning Growing thanks ❤

  • @AmitKumar-gw1he
    @AmitKumar-gw1he 12 дней назад +1

    Thanks a lot Ansh ❤

  • @Sunil999G
    @Sunil999G 12 дней назад +1

    Thanks brother 💖

  • @MickeyMack-qf2ey
    @MickeyMack-qf2ey 11 дней назад +1

    Informative one broooo!!!!
    Can you please create videos on delta live tables?

  • @Dk223-w9g
    @Dk223-w9g 12 дней назад +1

    Like it bro...❤❤

  • @krthiak
    @krthiak 12 дней назад +1

    This is brilliant

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад

      Thank you for your kind words :)

  • @RAHULKUMAR-px8em
    @RAHULKUMAR-px8em 12 дней назад +1

    Guru 🔥🔥🔥🔥🔥🔥

  • @dhirendramaurya5361
    @dhirendramaurya5361 12 дней назад +1

    You are awesome bro ❤ wow😮

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад

      Thank you for your kind words:) Happy Learning!

  • @rameshkandi
    @rameshkandi 9 дней назад

    Thank you Ansh.

  • @RadhaKrishna-i7s
    @RadhaKrishna-i7s 12 дней назад +1

    You are awesome ❤@Ansh bro

  • @shubhambhosale8467
    @shubhambhosale8467 9 часов назад

    not able to open notebook .dbc can you please change extention and send,thank you

  • @omkarbhosale4173
    @omkarbhosale4173 12 дней назад +1

    Wow thanks ansh

  • @AnkitSharma-ds9mo
    @AnkitSharma-ds9mo 12 дней назад +1

    Thanks Ansh

  • @iPhone-vk3bt
    @iPhone-vk3bt 11 дней назад +1

    Hey Ansh, Can we get interview questions even for Azure Data Factory. Thank you soo much for your wonderful help

  • @Vinay-m9u3r
    @Vinay-m9u3r 7 дней назад

    Thanks a lot Ansh. Can you make same kind of real time interview question for ADF as well it's so helpful.

  • @aprajitapandey482
    @aprajitapandey482 11 дней назад +1

    Hi @AnshLambaJSR , I can't use DBC file in work laptop. Can you please provide CSV format please

    • @aniketghodake1366
      @aniketghodake1366 11 дней назад

      Bro, how can a notebook output be CSV ?

    • @aprajitapandey482
      @aprajitapandey482 11 дней назад

      @aniketghodake1366 so, which app shall I use to open it?

  • @throwaway-z1j
    @throwaway-z1j 11 дней назад

    Ansh bro, I struggle a lot while answering ci/cd related questions in the interviews. Can you explain how they move environments and do prod deployment in databricks. Thanks

  • @JoyalJoseph-h4u
    @JoyalJoseph-h4u 6 дней назад

    HI Ansh, Unable to view the Notebook in the github.Seems empty. Can you reshare it.

  • @knowledge4686
    @knowledge4686 11 дней назад

    Hii Ansh ,
    At 1:09:26 there are two product rows in which 1 has , with it which has created the issue of calculating product column as 1.
    How are you going to correct that issue?

  • @Morphyto
    @Morphyto 9 дней назад

    Hey @AnshLambaJSR Love your videos. Can we also get a video on Azure interview questions??

    • @AnshLambaJSR
      @AnshLambaJSR  5 дней назад

      Stay tuned and Happy Learning :)

  • @GauravSubodh
    @GauravSubodh 12 дней назад +1

    OP bro OP........

  • @Chantilocal_007
    @Chantilocal_007 12 дней назад +2

    You know one telugu famous politician dialogue. " Nenu vinnanu nenu vunnanu", That's perfectly opt for you.

  • @sanketsalokhe4603
    @sanketsalokhe4603 День назад

    Thank you ANSH for great content...❤
    Please make one for AZURE

  • @shahzan525
    @shahzan525 12 дней назад

    Any classes you are arranging?

  • @Prasannabigdata
    @Prasannabigdata 8 дней назад

    kindly requesting that will you post scala complete explanation vedio in ytub🤗

  • @uoops87k8j76
    @uoops87k8j76 11 дней назад +1

    also checkpointing() what does it do

  • @Shantha-li2po
    @Shantha-li2po 10 дней назад +1

    Can you please make a tutorial of apache airflow

  • @Samudra1419
    @Samudra1419 9 дней назад

    How did you guys open this file?
    its in .dbc format
    edit : Steps to Import a .dbc File into Databricks:
    Log in to Databricks.
    Go to the Workspace tab.
    Navigate to a folder or create a new folder.
    Click on the "Import" button (usually at the top-right).
    Choose the .dbc file (PySpark Interview.dbc) from your local machine.
    Click Import.

  • @unknownstar8502
    @unknownstar8502 12 дней назад

    bro can you make vdo on optimization in adf databricks and how do data quality check in bronze layer

  • @kaushikmishra6071
    @kaushikmishra6071 10 дней назад

    ansh ,can you make adf real time scenarios questions playlist

  • @gudiatoka
    @gudiatoka 12 дней назад

    Bro.
    Rather than in single video.
    Break into series for this one .it will.be more helpful during preparation

  • @samlimvtchos6105
    @samlimvtchos6105 9 дней назад

    Make a video on sql and more queries also

  • @expertpkr4077
    @expertpkr4077 12 дней назад +1

    Thank you❤ thank you ❤ thank you❤ thank you❤ thank you❤ thank you❤ thank you ❤ , ............ Thanks ❤ a lot for the video , its very useful ❤

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      I am happy that you loved this PySpark Interview questions video:)

  • @vsu1225
    @vsu1225 11 дней назад

    Hi Ansh , Start with the Azure free trial, getting a phone number error like "We’re unable to validate your phone number." tried with different country phone numbers. Getting same error, please assist me.

  • @therawscholar
    @therawscholar 12 дней назад

    bang on bro

  • @PracticePrecision
    @PracticePrecision 9 дней назад

    Bro content is good ..but one issue ...no one can understand your content when in a hurry ..I mean in 2x speed

  • @SambaMitta
    @SambaMitta 9 дней назад

    unable to dowload the notebook

  • @Butterfly-oy5tr
    @Butterfly-oy5tr 12 дней назад

    If you make one video for AIRFLOW, that would be helpful for many people

  • @dipakchavan4659
    @dipakchavan4659 10 дней назад

    DP 203 Expiring? And introducing new Certificate Microsoft Fabric? Is it True? ADF is no longer? it will be part of Fabric in future? everything under Fabric in Future for Data engineers? Can u please make video on this for clarity and confusion which citification's are useful in the future for Azure Data Engineers. which side we should focus most. Fabric, ADF, Synapse, Or Databricks? Plz make one video on priority basis. Thanks in Advance Ansh.

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      STay tuned and Happy Learning :)

  • @fahadbawazir562
    @fahadbawazir562 12 дней назад +1

    HI ANSH, GOOD START IN 2025

  • @prafulhsolanki
    @prafulhsolanki 3 дня назад

    for Q 10 While preparing a data pipeline, you notice some duplicate rows in a dataset.
    How would you remove the duplicates without affecting the original order?
    Correct answer is to use monotonically_increasing_id along with dropDuplicates to preserve the order , we can't use Window function with ROW_NUMBER
    data = [("John", 25), ("Jane", 30), ("John", 25), ("Alice", 22)]
    columns = ["name", "age"]
    df = spark.createDataFrame(data, columns)
    df.show()
    df = df.withColumn("index", f.monotonically_increasing_id())
    df.printSchema()
    df.show()
    df = df.dropDuplicates(["name", "age"]).orderBy("index").drop("index")
    df.show()

  • @_udaalgirakio_o
    @_udaalgirakio_o 12 дней назад

    Bro what about machine learning with azure data engineering?

  • @Butterfly-oy5tr
    @Butterfly-oy5tr 12 дней назад +2

    Finally data engineer aspirants found the RUclips channel where they can learn everything 🎉
    Thanks a lot✨🛐

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Thank you so much for your kind words :)

  • @moyeenshaikh4378
    @moyeenshaikh4378 3 дня назад

    Bro sql and python questions?

  • @RaushanMaths
    @RaushanMaths 12 дней назад +1

    Great content as always ❤ You are a gift for your data fam🎉

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Thank you for your kind words :)

  • @Nick-du9ss
    @Nick-du9ss 11 дней назад +1

    make a interview question on a ADF

  • @krishnagupta5239
    @krishnagupta5239 12 дней назад +1

    bro please bring end to end project on fabric

  • @jairam00669
    @jairam00669 12 дней назад +5

    Ansh, I have a small suggestion for you. It would be great if you could upload videos on Friday nights. Since many of us are working professionals, this would allow us to dedicate time over the weekend, on Saturday and Sunday, to watch and engage with the content.

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад

      Thank you for your suggestion :)

    • @SuryaJN
      @SuryaJN 12 дней назад

      Exactly I was thinking to comment the same, Hi ansh this would help us so much😊

  • @abhay1446
    @abhay1446 12 дней назад

    Brother how can we just get a data engineering job as a fresher bcz i see opening for 3+ years experience for it in MUMBAI location.

  • @Uda_dunga
    @Uda_dunga 12 дней назад +2

    waiting for DLT Project

  • @rishangnp3217
    @rishangnp3217 12 дней назад

    Azure keyvault video

  • @lakshmiprasadpresident4462
    @lakshmiprasadpresident4462 12 дней назад

    Airflow tution request

  • @kirankarthikeyan4940
    @kirankarthikeyan4940 12 дней назад +1

    Great continue on the Azure DE interview series.

  • @rishangnp3217
    @rishangnp3217 12 дней назад +1

    Superb video,now a company asking azure devops and cicd integration as well

  • @rashuagarwal8119
    @rashuagarwal8119 День назад +1

    Bhai Teri sakal to itni achi h ni itni style na marke sidha sidha padhale

  • @rishangnp3217
    @rishangnp3217 12 дней назад

    Azure logic app

  • @fahadbawazir562
    @fahadbawazir562 12 дней назад +2

    also please make a video on why AZURE DATA ENGINEERING is retiring & how AZURE FABRIC is replacing AZURE DATA ENGINEERING

    • @krthiak
      @krthiak 12 дней назад

      I have spent 6 years using azure tools. Should I give dp 203 certif and complete it or wait, learn and prep for MS Fabric

    • @AnshLambaJSR
      @AnshLambaJSR  12 дней назад

      Stay tuned :)

  • @kasimbasha9789
    @kasimbasha9789 12 дней назад +1

    First of all, thank you for your efforts! Your content is truly helpful for those who are serious about the data engineering field. I have recommended your channel to many people, including my colleagues and friends. They have subscribed and are regularly following your videos. Please continue creating more videos-we are here to support you always 😊🙏🙂

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Glad to hear this:) Happy Learning :)

  • @sambitpati6129
    @sambitpati6129 12 дней назад +1

    our SUPERHERO is back !!! 💟❣❣💟💟❣❣❤‍🔥❤‍🔥❤‍🔥❤‍🔥

    • @AnshLambaJSR
      @AnshLambaJSR  10 дней назад

      Happy to see such excitement for my videos :) Happy Learning!

  • @RKu-iv7qr
    @RKu-iv7qr 8 дней назад

    Worst Chapri way of talking

    • @AnshLambaJSR
      @AnshLambaJSR  5 дней назад

      Thanks for the lovely compliment

  • @uoops87k8j76
    @uoops87k8j76 11 дней назад +1

    for question 1:
    can we use this ,its easier right ?
    df1=df.groupBy("product_id").agg(max("sales").alias("sales"), max("date").alias("date"))
    df1.display()

  • @ILoveSQL
    @ILoveSQL 12 дней назад +1

    Please create a video on ADF interview questions for real time scenarios and also once end to end project on adf only which covers all the activities mostly used and important scenarios. @anshLambaJSR