One of several insightful presentations to come out of the DOE Computational Science Graduate Fellowship was delivered by Katie Antypas, Services Department Head, National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory.
In “Preparing Your Application for Advanced Manycore Architectures,” Antypas gives a humorous and on-point overview of major architectural trends in HPC and talks about why they are happening and what users can do to start preparing their codes for the manycore era.
While these terms do not have precise definitions, Antypas suggests that there is about a half order of magnitude difference in going from multicore to manycore. Manycore basically refers to an architecture with lots of lightweight cores (hint: lightweight means slow). “With manycore architecture, the number of cores is more important than the speed of any given core, and in the multicore era, you are still thinking about single cores, and single thread performance,” she says.
Antypas continues by addressing the factors that are contributing to the change of paradigms. The community is coming up against a number of walls, the most prominent of which are the memory wall, the power wall and the parallelism wall. For users, this means their jobs are harder because the onus is on them to change application codes to achieve high performance. Antypas goes over several sources of parallelism, including domain parallelism, thread parallelism, data parallelism, and instruction-level parallelism, and cites reasons to use each of these. Also discussed is the renewed focus on vectorization.
“Regardless of processor architecture, users will need to modify applications to achieve performance,” maintains Antypas. This will necessitate users to:
+ Expose more on-node parallelism in applications
+ Increase application vectorization capabilities
+ Manage hierarchical memory
+ For co-processor architectures, locality directives must be added
The presentation also includes an overview of the upcoming NERSC system, Cori, which will employ the newest Phi “Knights Landing” processors. Delivery is scheduled for mid-2016.
Antypas points out that knowing the peak performance (3 teraflops) is not a very helpful statistic, but the fact that it is self-hosted processor, not an accelerator, will be very good for users, who “already have a lot to worry about with finding more threading and parallelism and won’t have to worry about offloading to a coprocessor.”
You can find her one-hour presentation and slides here and embedded below: