To reduce the shuffling, we can use reduce by key, In general narrow transformation to be used , if an unavoidable wide transformation like a join, group By, distinct is there, we can filter the data first and then do the wide transformation in order to minimize the unwanted data. So lesser records will be shuffled.
Sir .I have an experience of 2.5 years as a Manual test engineer but rather than automation I am thinking to switching to big data testing therefore I end up doing a certification course as well.How should i apply for interviews.
really talented candidate
To reduce the shuffling, we can use reduce by key,
In general narrow transformation to be used , if an unavoidable wide transformation like a join, group By, distinct is there, we can filter the data first and then do the wide transformation in order to minimize the unwanted data.
So lesser records will be shuffled.
Very nice content
Good one
Sir
.I have an experience of 2.5 years as a Manual test engineer but rather than automation I am thinking to switching to big data testing therefore I end up doing a certification course as well.How should i apply for interviews.
Which certificate you did,??
Hdfs for 128mb and local for 32mb
can i attend this mock interview
Plz give me answer of "100 mb file and read the contents of this file and write it into another file 5 times"
How we can write 5 times...?
When you write this file you can use .repartition(5)
count=5
while [ $count -gt 0 ];
do cat Test/Test.txt >> Test/Test2.txt;
count=$((count-1));
done
Use cp command in shell script
Default blocksize in lfs is 4 kb
Wo banda hushar tha.
Use rank functions to delete duplicate
No bro use row_id to delete dupliacates records
could you please provide feedback on this interview?
You will get Interview feedbacks in future videos. Thanks for watching.
How to apply for mock interview
Send resume to shareit2904@gmail.com