Spark Scenario Based Question | Deal with Ambiguous Column in Spark | Using PySpark | LearntoSpark

Azarudeen Shahul

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 дек 2024

Комментарии • 33

@Akshaykumar-pu4vi 2 года назад ⁺¹
In pyspark we can simply apply
df_final=df.withColumn("Name",df["name0"]).drop("name0","name4")
In upgrade version of pyspark it will display value with indexing by default
"create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so"
Thank you so much for this playlist Sir!
@srinivasa1187 2 года назад
It Did not work for me, using Pyspark 3.
could you write the exact syntax
or
refer me any page from where you have got it.
Thanks
@localmartian9047 2 года назад
@@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.
@ashutoshrai5342 4 года назад ⁺¹
Great work.Keep posting new use cases.You will definetly make it big.Thank you
@akshayanand6803 4 года назад
Wanted to deal with duplicate columns as well... This is nice
@Shiva-kz6tn 4 года назад ⁺²
Good one.. please post in scala as well!
@bhaskarreddy-wt5rc 3 года назад ⁺¹
Before unwrapping the inner JSON , we can rename the name coloum and we can unwrapped the inner JSON right
@vuppalanaveenkrishna6070 3 года назад
Thanks azhar...I did this exercize
@AzarudeenShahul 3 года назад
Great job!
@sivavalluru3864 4 года назад ⁺¹
Nice explaantion keep it up
@sushantshekhar9409 3 года назад
Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..
@ramyagudivaka4944 2 года назад
Thanks for your efforts. Amazing work
could you please this put the logic in spark scala also
@bhavitavyashrivastava8600 4 года назад
I have made Python machine learning web app can I do the same with Pyspark MLlib .
IF yes then how ?
I have used Heroku for my Python machine l apps ?
@ayushmittal3948 4 года назад ⁺³
cant we rename column by this code:-
for i in df_cols:
if i in lst:
i = i+"new"

lst.append(i)
it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above
@localmartian9047 2 года назад
Almost works. To make it full proof in case say there are multiple duplicate columns, have a counter inside loop and append that instead of "new"
@ayushmittal3948 2 года назад
@@localmartian9047 yes can do that as well
@nareshreddy-l3f Год назад
lst=[]
x=1
for i in df2.columns:
if i in lst:
i = i+str(x)
x=x+1
lst.append(i)
print(lst)
@nareshreddy-l3f Год назад
lst=[]
x=1
for i in df2.columns:
if i in lst:
i = i+str(x)
x=x+1
lst.append(i)
print(lst)
@SujeetKumarMehta-kk7kw Год назад
Very very thanks .....
@ravikirantuduru1061 4 года назад
Can project template for pyspark project to submit job in cluster
@ayushmittal3948 4 года назад ⁺²
cant we rename column by this code:-
for i in df_cols:
if i in lst:
i = i+"new"

lst.append(i)
it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above.
o/p
['name', 'product', 'address', 'mob', 'namenew']
@subramanyams3742 4 года назад
creating the our own schema does not help is it?
@aspait 4 года назад
we can also use rename column option
@AzarudeenShahul 4 года назад
With column rename option will not remove ambiguous issue.. please try and let me know
@manojkalyan94 4 года назад ⁺¹
bro can you make video on unit testing
@AzarudeenShahul 4 года назад
Sure bro.. u can expect one soon
@ppriya8150 4 года назад
Sir could you please explain the same thing in spark scala in next video
@ppriya8150 4 года назад ⁺¹
Hi sir could you please explain same in spark scala 🙏
@AzarudeenShahul 4 года назад
Sure will try to make in spark scala aswell
@ppriya8150 4 года назад
@@AzarudeenShahul TanQ Sir
@dippusingh3204 4 года назад ⁺¹
val df = spark.read.option("multiline", "true").json("input/input1.json")
//df.show(false)
//df.printSchema()
val df0=df.select("*", "Delivery.*").drop("Delivery")
df0.show(false)
var list = df0.schema.map(_.name).toList
for(i
@ppriya8150 3 года назад
@@dippusingh3204 thank you so much

Следующие

Автовоспроизведение

Spark Scenario Based Question | Convert Pandas DF to Spark DF | Handle DataType Issue | LearntoSpark

$Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark$ 9:22 $Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark$

21:12

$Apache Spark | Spark Scenario Based Question | Spark Read Json {From_JSON, To_JSON, JSON_Tuple }$ 11:40 $Apache Spark | Spark Scenario Based Question | Spark Read Json {From_JSON, To_JSON, JSON_Tuple }$

Handling Missing Values in Spark Dataframes