Every job is a piece of work to be executed by an executor on a cluster. A query is analyzed and then split into stages according to the transformations in the query itself. Every stage is then split into multiple jobs which can be parallelized and pipelined for best efficiency.
This is very helpful thank you!
Awesome presentation. Really useful
is there a repository to go over the real time bad vs good written spark sql ?
Awesome presentation :)
Thanks much !!! Very useful
Why does HashMergeJoin not mentioned in the presentation?
Why does a spark query is translated to multiple spark jobs?
Every job is a piece of work to be executed by an executor on a cluster. A query is analyzed and then split into stages according to the transformations in the query itself. Every stage is then split into multiple jobs which can be parallelized and pipelined for best efficiency.