71. Databricks | Pyspark | Window Functions: Lead and Lag

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 22

  • @jagadeeswaran330
    @jagadeeswaran330 10 месяцев назад +1

    Great explanation sir, thank you!

  • @sravankumar1767
    @sravankumar1767 2 года назад +2

    Nice explanation Raja 👌 👍 👏

  • @sharmadtadkodkar2038
    @sharmadtadkodkar2038 Год назад +1

    Great content. But not able to find all your videos under your page. Can you please add them in one page and share the link.

  • @rajunaik8803
    @rajunaik8803 Год назад +1

    Hi Raja, you are not mentioning dataframe name while performing window definition. Will it automatically point to dataframe created in previous cell? And what if my notebook has multiple dataframes?

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад +1

      Hi Raju,
      Dataframe must be mentioned to apply window function. I would have defined window logic separately and applied that later in a dataframe. It cannot be applied automatically on any dataframe

    • @rajunaik8803
      @rajunaik8803 Год назад

      @@rajasdataengineering7585 my bad, thanks Raja

    • @travelfoodlife1449
      @travelfoodlife1449 9 месяцев назад +1

      So first logic will be given and when we call data frame that will apply logic and perform the action right sir 🙌

    • @rajasdataengineering7585
      @rajasdataengineering7585  9 месяцев назад

      Yes that's correct

  • @aswaniyettapu9992
    @aswaniyettapu9992 2 года назад +1

    Have you posted videos on sort merge join, autobroad cast join and shuffle join..? if not can you post that video

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 года назад +1

      Have already posted video on Broadcast join
      ruclips.net/video/HDiXK3Gl-hs/видео.html
      yet to post on sort merge and shuffle join

  • @sachinchandanshiv7578
    @sachinchandanshiv7578 2 года назад +1

    Hi Sir,
    Stuck in one scenario. Please assist
    input datafram is as below
    id description
    1 "abcd
    pqe
    rrr
    qqq"
    2 "ttt
    ppp
    ooo
    www
    iii"
    3 "aaa
    ppp
    eee
    zzz
    rrrr"
    4 "ssss
    jjjj"
    output dataframe should be as below last line of element
    id description
    1 qqq
    2 iii
    3 rrrr
    4 jjjj

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 года назад

      Hi Sachin, split function can be used for this requirement

    • @sachinchandanshiv7578
      @sachinchandanshiv7578 2 года назад +1

      @@rajasdataengineering7585 Yes Sir. By using split I am able to seperate line but I am unable to extract last element from dataframe cell. bcuz last line can be at any position. last line may be at 2nd, 3rd, 5th or any position.

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 года назад

      Try to use index[-1].
      If not, we can calculate the number of elements and find the max of it using UDF

    • @sachinchandanshiv7578
      @sachinchandanshiv7578 2 года назад +1

      @@rajasdataengineering7585
      Done ! Thanks 🙂
      def get_last_ele(str1):
      str1=str1.split("
      ")
      f1=[]
      for i in str1:
      f1.append(i)
      return f1[-1]
      stringUDF = udf(lambda m: get_last_ele(m))
      df1.withColumn("last element", stringUDF("name")).show()

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 года назад

      Welcome 🙂
      I will try to post optimized solution if any when I get sometime.
      Happy to know that you solved this requirement in quick time!

  • @manikanta-zq1yg
    @manikanta-zq1yg 7 месяцев назад

    Thank you