In pyspark we can simply apply df_final=df.withColumn("Name",df["name0"]).drop("name0","name4") In upgrade version of pyspark it will display value with indexing by default "create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so" Thank you so much for this playlist Sir!
@@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.
cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new"
lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above
cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new"
lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above. o/p ['name', 'product', 'address', 'mob', 'namenew']
val df = spark.read.option("multiline", "true").json("input/input1.json") //df.show(false) //df.printSchema() val df0=df.select("*", "Delivery.*").drop("Delivery") df0.show(false) var list = df0.schema.map(_.name).toList for(i
In pyspark we can simply apply
df_final=df.withColumn("Name",df["name0"]).drop("name0","name4")
In upgrade version of pyspark it will display value with indexing by default
"create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so"
Thank you so much for this playlist Sir!
It Did not work for me, using Pyspark 3.
could you write the exact syntax
or
refer me any page from where you have got it.
Thanks
@@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.
Great work.Keep posting new use cases.You will definetly make it big.Thank you
Wanted to deal with duplicate columns as well... This is nice
Good one.. please post in scala as well!
Before unwrapping the inner JSON , we can rename the name coloum and we can unwrapped the inner JSON right
Thanks azhar...I did this exercize
Great job!
Nice explaantion keep it up
Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..
Thanks for your efforts. Amazing work
could you please this put the logic in spark scala also
I have made Python machine learning web app can I do the same with Pyspark MLlib .
IF yes then how ?
I have used Heroku for my Python machine l apps ?
cant we rename column by this code:-
for i in df_cols:
if i in lst:
i = i+"new"
lst.append(i)
it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above
Almost works. To make it full proof in case say there are multiple duplicate columns, have a counter inside loop and append that instead of "new"
@@localmartian9047 yes can do that as well
lst=[]
x=1
for i in df2.columns:
if i in lst:
i = i+str(x)
x=x+1
lst.append(i)
print(lst)
lst=[]
x=1
for i in df2.columns:
if i in lst:
i = i+str(x)
x=x+1
lst.append(i)
print(lst)
Very very thanks .....
Can project template for pyspark project to submit job in cluster
cant we rename column by this code:-
for i in df_cols:
if i in lst:
i = i+"new"
lst.append(i)
it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above.
o/p
['name', 'product', 'address', 'mob', 'namenew']
creating the our own schema does not help is it?
we can also use rename column option
With column rename option will not remove ambiguous issue.. please try and let me know
bro can you make video on unit testing
Sure bro.. u can expect one soon
Sir could you please explain the same thing in spark scala in next video
Hi sir could you please explain same in spark scala 🙏
Sure will try to make in spark scala aswell
@@AzarudeenShahul TanQ Sir
val df = spark.read.option("multiline", "true").json("input/input1.json")
//df.show(false)
//df.printSchema()
val df0=df.select("*", "Delivery.*").drop("Delivery")
df0.show(false)
var list = df0.schema.map(_.name).toList
for(i
@@dippusingh3204 thank you so much