Breaking down application execution in 5 phases (0-4) can give insights in spotting run time bottlenecks.
In this case, looking at the default Spark UI (bottom) it looks as if phase 3 is the bottleneck, using up lots of CPU.
Looking at Pepperdata's custom UI however (top), it becomes clear that phases 2 and 3 run in parallel, and while phase 3 completes phase 2 continues running and using up CPU.
In 2 runs of the same application, run times are not consistent.
This has to do with cluster weather: in the 2nd run there is less memory available to the available, as there are other applications running at the same time.
Pepperdata Code Analyzer for Apache Spark is aimed at engineers and complements Pepperdata existing line of products