On Monday at the Hot Chips 26 conference, Fujitsu revealed details about its upcoming SPARC64 XIfx chips, which the company is counting on to reach the 100-petaflops barrier and revitalize its HPC game. Of the top 10 fastest supercomputers on the current TOP500 list, only one is a Fujitsu system. But with the new 32-core SPARC chips touting more than one teraflops of double-precision performance, Fujitsu has the makings of a winning strategy for reaching beyond the petascale era.
When it comes to the supercomputing arms race, the Japanese chip and server maker should not be underestimated. It was the company’s eight-core SPARC64 VIIIfx processor and custom Tofu interconnect that catapulted the K supercomputer to a record-setting 10 petaflops LINPACK performance. The new SPARC64 XIfx chip is a follow-up to the SPARC64 IXfx, used in the company’s PrimeHPC FX10 machines. The SPARC64 X chip powered general purpose servers, but did not have an HPC counterpart.
The new processors have 32 main processing cores and two so-called “assistant” cores, yielding a peak performance of 1.1 teraflops. Compared to the Sparc64 IXfx parts, the new chips have 3.2 times the double-floating performance and 6.1 times the single precision. The assistant cores will be helpful for avoiding OS jitter and non-blocking MPI functions, according to Fujitsu.
The post-FX10 system (as yet un-named) will be constructed of highly integrated components in a high-density package. Company slides depict racks with 216-nodes (there are 12 nodes on a chassis — one CPU per node — and 18 chassis on a rack), meaning that a peak performance of more than 100-petaflops will be possible in a mere 463-rack footprint. To put the capability of the new machine in perspective, the performance of just one 12-node chassis corresponds to an entire cabinet of the K computer.
Each water-cooled chassis includes multiple Micron Hybrid Memory Cubes, which stack system memory in a 3D structure to deliver faster throughput and superior power utilization. The new Tofu 2 Interconnect increases link bandwidth by 2.5 times, supporting 12.5 gigabytes/second bidirectional communication links. Fujitsu reports that the entire software stack has also been enhanced for the successor to PrimeHPC FX10. Despite the new features, the FX-10 system maintains similar architecture (to both K and FX10) to promote application compatibility, according to the company.
A 100-petaflops system will have roughly twice the peak performance of the world’s current fastest computer, China’s Tianhe 2 machine, which achieves a peak performance of 54.9 petaflops (and a LINPACK of 33.86 petaflops) using a hybrid approach that combines Intel’s Xeon CPUs and Xeon Phi coprocessors.
Fujitsu’s plan is to keep evolving the K computer architecture all the way to exascale horizon, which the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) has targeted for the year 2020. The strategy involves co-design with selected target applications and innovations in the system software stack. Fujitsu is also working with both MEXT and RIKEN, the agency that operates the K computer, on a plan to drive exascale development.