Replacing Map Reduce with Spark SQL will obviously speed up the process. I do not see the mentioned queries being optimized though since a full scan of data is being done. The business analysts need to know the sql for writing the analysis jobs. IMHO a better option is write custom spark jobs which can be tweaked in a more better manner as per the use-case then simply relying on Spark SQL. Expose a api layer with all the business logic involved that can be used by a UI interface to get whatever data/results that the BA needs. One can make the api(s) more better as per what the BA needs.
Very nice presentation. I would like to know how are you caching the last 3 days of data in memory.
Replacing Map Reduce with Spark SQL will obviously speed up the process. I do not see the mentioned queries being optimized though since a full scan of data is being done. The business analysts need to know the sql for writing the analysis jobs. IMHO a better option is write custom spark jobs which can be tweaked in a more better manner as per the use-case then simply relying on Spark SQL. Expose a api layer with all the business logic involved that can be used by a UI interface to get whatever data/results that the BA needs. One can make the api(s) more better as per what the BA needs.