71. Databricks | Pyspark | Window Functions: Lead and Lag

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024
  • Azure Databricks Learning: Window Functions - Lead and Lag
    ==================================================
    What are window functions and what is the use of lead and lag functions?
    Window functions are very commonly used data transformations in databricks development. Window functions are transformations which split data into partitions and apply certain computation on top of partitioned data.
    This video covers more details about lead and lag functions
    #WindowFunctions, #DatabricksWindowFunction, #DatabricksWindow, #DatabricksWindowLead, #DatabricksWindowLag, #WindowFunctionsLeadLag #LeadLag, #PysparkLeadLag, #PysparkLead, #PysparkLag,#DatabricksRealtime, #SparkRealTime, #DatabricksInterviewQuestion, #DatabricksInterview, #SparkInterviewQuestion, #SparkInterview, #PysparkInterviewQuestion, #PysparkInterview, #BigdataInterviewQuestion, #BigdataInterviewQuestion, #BigDataInterview, #PysparkPerformanceTuning, #PysparkPerformanceOptimization, #PysparkPerformance, #PysparkOptimization, #PysparkTuning, #DatabricksTutorial, #AzureDatabricks, #Databricks, #Pyspark, #Spark, #AzureDatabricks, #AzureADF, #Databricks, #LearnPyspark, #LearnDataBRicks, #DataBricksTutorial, #azuredatabricks, #notebook, #Databricksforbeginners

Комментарии • 22

  • @jagadeeswaran330
    @jagadeeswaran330 7 месяцев назад +1

    Great explanation sir, thank you!

  • @sravankumar1767
    @sravankumar1767 2 года назад +2

    Nice explanation Raja 👌 👍 👏

  • @sharmadtadkodkar2038
    @sharmadtadkodkar2038 11 месяцев назад +1

    Great content. But not able to find all your videos under your page. Can you please add them in one page and share the link.

  • @rajunaik8803
    @rajunaik8803 Год назад +1

    Hi Raja, you are not mentioning dataframe name while performing window definition. Will it automatically point to dataframe created in previous cell? And what if my notebook has multiple dataframes?

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад +1

      Hi Raju,
      Dataframe must be mentioned to apply window function. I would have defined window logic separately and applied that later in a dataframe. It cannot be applied automatically on any dataframe

    • @rajunaik8803
      @rajunaik8803 Год назад

      @@rajasdataengineering7585 my bad, thanks Raja

    • @travelfoodlife1449
      @travelfoodlife1449 6 месяцев назад +1

      So first logic will be given and when we call data frame that will apply logic and perform the action right sir 🙌

    • @rajasdataengineering7585
      @rajasdataengineering7585  6 месяцев назад

      Yes that's correct

  • @manikanta-zq1yg
    @manikanta-zq1yg 3 месяца назад

    Thank you

  • @aswaniyettapu9992
    @aswaniyettapu9992 2 года назад +1

    Have you posted videos on sort merge join, autobroad cast join and shuffle join..? if not can you post that video

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 года назад +1

      Have already posted video on Broadcast join
      ruclips.net/video/HDiXK3Gl-hs/видео.html
      yet to post on sort merge and shuffle join

  • @sachinchandanshiv7578
    @sachinchandanshiv7578 Год назад +1

    Hi Sir,
    Stuck in one scenario. Please assist
    input datafram is as below
    id description
    1 "abcd
    pqe
    rrr
    qqq"
    2 "ttt
    ppp
    ooo
    www
    iii"
    3 "aaa
    ppp
    eee
    zzz
    rrrr"
    4 "ssss
    jjjj"
    output dataframe should be as below last line of element
    id description
    1 qqq
    2 iii
    3 rrrr
    4 jjjj

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад

      Hi Sachin, split function can be used for this requirement

    • @sachinchandanshiv7578
      @sachinchandanshiv7578 Год назад +1

      @@rajasdataengineering7585 Yes Sir. By using split I am able to seperate line but I am unable to extract last element from dataframe cell. bcuz last line can be at any position. last line may be at 2nd, 3rd, 5th or any position.

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад

      Try to use index[-1].
      If not, we can calculate the number of elements and find the max of it using UDF

    • @sachinchandanshiv7578
      @sachinchandanshiv7578 Год назад +1

      @@rajasdataengineering7585
      Done ! Thanks 🙂
      def get_last_ele(str1):
      str1=str1.split("
      ")
      f1=[]
      for i in str1:
      f1.append(i)
      return f1[-1]
      stringUDF = udf(lambda m: get_last_ele(m))
      df1.withColumn("last element", stringUDF("name")).show()

    • @rajasdataengineering7585
      @rajasdataengineering7585  Год назад

      Welcome 🙂
      I will try to post optimized solution if any when I get sometime.
      Happy to know that you solved this requirement in quick time!