We talked with the designers and, equally importantly, the influencers behind the creation of Allinea Software’s latest product, Allinea MAP.
Launched to major acclaim at SC12, Allinea MAP represents both a step change in performance analysis and a remarkable example of how to engage real users in the design process.
“We worked with a range of leading HPC users across the world. From the get-go we sat down with a rough whiteboard sketch and tried to figure out together what we could do to get more people profiling their codes effectively and without having to spend ages learning how the tool works”, recalls Mark O’Connor, Product Manager at Allinea Software. “Our objective was to allow non-specialist users to get meaningful insight into the performance of their code easily and quickly.”
Figure 1: An MPI profiler that just works,
without slowing down your program
This ‘Crowd Sourced’ design process resulted in several unique features. For example, Allinea MAP eschews the classic instrumentation-based MPI timeline. Instead it favors a dynamic sampling engine that scales to tens of thousands of processes whilst adding less than 5% to the total runtime.
“At scale the old way of doing things just does not work. We also knew from our users that a common cognitive model across products is vital to user adoption,” comments David Lecomber, CTO at Allinea Software. He adds, “we already had a scalable launch and merge infrastructure in our debugger product, Allinea DDT, that is proven at 705,000 processes. Adopting this gave us a common visual interface over the entire Allinea environment. Plus it saved development time, which allowed us to focus more of our efforts on the user experience.”
As the designers of major systems, such as Blue Waters, cast doubt on the use of simple benchmarks to judge system performance, the spotlight falls on to how users can get their job done in less time and with fewer machine resources – Allinea MAP is key to achieving these goals.
The real issue in HPC today and tomorrow is not how much ‘tin’ can be afforded but rather how good is the development environment in getting the users to their application objectives quickly, easily and cost effectively.
Figure 2: Get results right away, see at a glance
which lines of code are slow
The Allinea environment has been turning some influential heads across the globe. “I really liked seeing the performance information directly in the source code,” said Helen He of the National Energy Research Supercomputer Center (NERSC) at Lawrence Berkeley National Laboratory, Department of Energy. “It’s very cool,” agrees Richard Gerber, Deputy Group Lead for the NERSC User Services Group, “we need a tool that helps users to help themselves.”
The rallying-cry around a more usable, reliable profiling tool was repeated across the Atlantic. “When we profile code, the simplest metrics are sometimes the trickiest to collect. A lightweight tool like Allinea MAP lets us get this key performance data faster,” said Steven Jarvis, head of the High Performance Systems Group at Warwick University, England, whose group reviewed and advised on several iterations of Allinea MAP.
Allinea Software’s ‘Crowd Sourced’ development hasn’t stopped at launch. Customers who signed up during SC12 received exclusive invitations to an extended development phase to fit the product to their users’ needs. For example, with Allinea DDT already providing support for Intel Xeon Phi, Allinea Software will be working with several key sites who are expecting Allinea MAP for Phi support to follow in short order.
Figure 3: Deepen your understanding, learn why
each hotspot exists
“The use of a common platform lets us execute our roadmap at a very aggressive pace,” explains David Lecomber. “It’s enabling us to add support for almost every platform, and OpenMP and threads to Allinea MAP, in a matter of months rather than years. We’ll be making some big announcements for both products later this year!”
“Developing a product in parallel with key labs like this is the only way to work with today’s product lifecycle,” adds Mark O’Connor, “but then I guess parallelizing a process to get better results faster just comes naturally to those who work in HPC.”