Hi, I am facing skewed data issue in my spark application. Here I have 2 tables both are of same size (in the sense same rows but different column size) and am checking table A not in table B. This Spark SQL is taking lot of time. I have given 100 executers in production env and also tried writing the both tables to a file to avoid in memory processing for such huge data and tried reading it to do the sql operation. My application contains a lot of spark sql operation and this sql comes in some what in between the entire operation. When i run my application, it runs till this sql and then takes more than 6hrs to run 2M records How can I achieve faster result with repartitioning, or iterative broadcast. Please help.
great talk both of you.
Very nice presentation!! 👏👏👏
@22:13 - where can i find an example of implementation with the SQL API ?
Can you please provide the link to benchmark in githug
Go to 23:25 in the video, he shows the GitHub URL in that part of the video.
Hi, I am facing skewed data issue in my spark application. Here I have 2 tables both are of same size (in the sense same rows but different column size) and am checking table A not in table B. This Spark SQL is taking lot of time.
I have given 100 executers in production env and also tried writing the both tables to a file to avoid in memory processing for such huge data and tried reading it to do the sql operation.
My application contains a lot of spark sql operation and this sql comes in some what in between the entire operation. When i run my application, it runs till this sql and then takes more than 6hrs to run 2M records
How can I achieve faster result with repartitioning, or iterative broadcast. Please help.
Hi VIshakh did you found the solution for the problem you mentioned ?
@@arpangrwl May i know the Solution What was needs to be done??
Try bucketing the table before writing, it might take longer during write. But will be faster during joins
check this : ruclips.net/video/HIlfO1pGo0w/видео.html