Spark Scenario Based Question | Deal with Ambiguous Column in Spark | Using PySpark | LearntoSpark

Поделиться
HTML-код
  • Опубликовано: 11 дек 2024

Комментарии • 33

  • @Akshaykumar-pu4vi
    @Akshaykumar-pu4vi 2 года назад +1

    In pyspark we can simply apply
    df_final=df.withColumn("Name",df["name0"]).drop("name0","name4")
    In upgrade version of pyspark it will display value with indexing by default
    "create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so"
    Thank you so much for this playlist Sir!

    • @srinivasa1187
      @srinivasa1187 2 года назад

      It Did not work for me, using Pyspark 3.
      could you write the exact syntax
      or
      refer me any page from where you have got it.
      Thanks

    • @localmartian9047
      @localmartian9047 2 года назад

      @@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.

  • @ashutoshrai5342
    @ashutoshrai5342 4 года назад +1

    Great work.Keep posting new use cases.You will definetly make it big.Thank you

  • @akshayanand6803
    @akshayanand6803 4 года назад

    Wanted to deal with duplicate columns as well... This is nice

  • @Shiva-kz6tn
    @Shiva-kz6tn 4 года назад +2

    Good one.. please post in scala as well!

  • @bhaskarreddy-wt5rc
    @bhaskarreddy-wt5rc 3 года назад +1

    Before unwrapping the inner JSON , we can rename the name coloum and we can unwrapped the inner JSON right

  • @vuppalanaveenkrishna6070
    @vuppalanaveenkrishna6070 3 года назад

    Thanks azhar...I did this exercize

  • @sivavalluru3864
    @sivavalluru3864 4 года назад +1

    Nice explaantion keep it up

  • @sushantshekhar9409
    @sushantshekhar9409 3 года назад

    Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..

  • @ramyagudivaka4944
    @ramyagudivaka4944 2 года назад

    Thanks for your efforts. Amazing work
    could you please this put the logic in spark scala also

  • @bhavitavyashrivastava8600
    @bhavitavyashrivastava8600 4 года назад

    I have made Python machine learning web app can I do the same with Pyspark MLlib .
    IF yes then how ?
    I have used Heroku for my Python machine l apps ?

  • @ayushmittal3948
    @ayushmittal3948 4 года назад +3

    cant we rename column by this code:-
    for i in df_cols:
    if i in lst:
    i = i+"new"

    lst.append(i)
    it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above

    • @localmartian9047
      @localmartian9047 2 года назад

      Almost works. To make it full proof in case say there are multiple duplicate columns, have a counter inside loop and append that instead of "new"

    • @ayushmittal3948
      @ayushmittal3948 2 года назад

      @@localmartian9047 yes can do that as well

    • @nareshreddy-l3f
      @nareshreddy-l3f Год назад

      lst=[]
      x=1
      for i in df2.columns:
      if i in lst:
      i = i+str(x)
      x=x+1
      lst.append(i)
      print(lst)

    • @nareshreddy-l3f
      @nareshreddy-l3f Год назад

      lst=[]
      x=1
      for i in df2.columns:
      if i in lst:
      i = i+str(x)
      x=x+1
      lst.append(i)
      print(lst)

  • @SujeetKumarMehta-kk7kw
    @SujeetKumarMehta-kk7kw Год назад

    Very very thanks .....

  • @ravikirantuduru1061
    @ravikirantuduru1061 4 года назад

    Can project template for pyspark project to submit job in cluster

  • @ayushmittal3948
    @ayushmittal3948 4 года назад +2

    cant we rename column by this code:-
    for i in df_cols:
    if i in lst:
    i = i+"new"

    lst.append(i)
    it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above.
    o/p
    ['name', 'product', 'address', 'mob', 'namenew']

  • @subramanyams3742
    @subramanyams3742 4 года назад

    creating the our own schema does not help is it?

  • @aspait
    @aspait 4 года назад

    we can also use rename column option

    • @AzarudeenShahul
      @AzarudeenShahul  4 года назад

      With column rename option will not remove ambiguous issue.. please try and let me know

  • @manojkalyan94
    @manojkalyan94 4 года назад +1

    bro can you make video on unit testing

  • @ppriya8150
    @ppriya8150 4 года назад

    Sir could you please explain the same thing in spark scala in next video

  • @ppriya8150
    @ppriya8150 4 года назад +1

    Hi sir could you please explain same in spark scala 🙏

    • @AzarudeenShahul
      @AzarudeenShahul  4 года назад

      Sure will try to make in spark scala aswell

    • @ppriya8150
      @ppriya8150 4 года назад

      @@AzarudeenShahul TanQ Sir

    • @dippusingh3204
      @dippusingh3204 4 года назад +1

      val df = spark.read.option("multiline", "true").json("input/input1.json")
      //df.show(false)
      //df.printSchema()
      val df0=df.select("*", "Delivery.*").drop("Delivery")
      df0.show(false)
      var list = df0.schema.map(_.name).toList
      for(i

    • @ppriya8150
      @ppriya8150 3 года назад

      @@dippusingh3204 thank you so much