Japan, the island nation renowned for its energy and space-saving design prowess, just nabbed the top three spots on the latest Green500 list and claimed eight of the 20 top spots. The Shoubu supercomputer from RIKEN took top honors on the 17th edition of the twice-yearly listing, becoming the first TOP500-level supercomputer to surpass the seven gigaflops-per-watt milestone.
Green500 list founder Wu-chun Feng, who goes by “Wu,” notes that being a small nation from a population standpoint, Japan cannot afford to be wasting or making less efficient use of critical power resources. “It’s a little deja vu,” he says, referring to the automobile landscape of the 1970s and 1980s, “it’s not that we as US people don’t care about power, but we have less constraints put on us when it comes to developing supercomputers. In Japan it’s all about efficiency, space efficiency and power efficiency.”
The same kind of innovation was seen in the automobile space where Japan led the world in designing and popularizing smaller and more fuel-efficient designs, leading up to the iconic green vehicle, the Toyota Prius.
Expectations were that a machine on this list would pass the six gigaflops-per-watt threshold. The previous green supercomputing champ, L-CSC from the GSI Helmholtz Center, was the first to overcome the 5 gigaflops-per-watt barrier, when it achieved 5.27 gigaflops-per-watt on the November 2014 list, but as we see Shoubu made it all the way to 7.03 gigaflops-per-watt.
Shoubu was followed closely by two machines from the High Energy Accelerator Research Organization (KEK): Suiren Blue, which took second place with 6.84 gigaflops-per-watt, and Suiren, which claimed third place with 6.22 gigaflops-per-watt. All three of these machines have the distinction of being the first 6+ gigaflops-per-watt systems on the TOP500/Green500 lists, and all three were the result of a collaborative effort between fabless Japanese startup PEZY and immersion cooling company Exascalar.
All were built using PEZY’s second generation 1,024 core custom MIMD processor and Exascalar’s submersion liquid cooling technology. The lead machine Shoubu employed ExaScaler second-generation technology along with Intel’s Xeon E5-2618L v3 (8 cores / 16 threads, 2.3GHz ~ 3.4GHz) processor, equipped with 64GB memory and InfiniBand FDR. The “PEZY-SC” accelerator processor is said to offer 3 teraflops single-precision and 1.5 teraflops double-precision performance. Shoubu has a theoretical peak performance of 842.96 teraflops and measured LINPACK of 412.67 teraflops, sufficient for a 160th spot on the latest TOP500.
Suiren Blue and Suiren are smaller less-performant machines with the former achieving 193.91 teraflops LINPACK for a 392 TOP500 ranking and the latter delivering 206.57 teraflops LINPACK for a 366 placement.
PEZY’s performance aspirations are no secret if you know what the company’s name stands for: PEZY = Peta, Exa, Zetta, Yotta. They were established in 2010 but came to the attention of the supercomputing community last year with the debut of their first TOP500 machine, Suiren. The team tried hard for a Green500 victory and fell just short, list founder Wu reports. Interestingly, their increased performance and energy-efficiency this year looks to have been achieved without any hardware changes. The TOP500 specs for each list (November 2014 and June 2015) show a machine with the same core count (262,784) and the same theoretical peak performance (373.02 teraflops).
But the LINPACK did change, from 178.1 to 206.6 teraflops, sufficient to boost Suiren’s TOP500 ranking from 369 to 366, and the machine’s total power use fell from 37.83 kW to 32.59 kW. These combined upgrades brought the machine’s energy efficiency up from 4.95 gigaflops-per-watt to 6.22 gigaflops-per-watt. If the PEZY/Exascaler partnership didn’t have two other systems in the running, the top spot would have been Suiren’s. A statement from Exascaler confirms that this 25.6 percent power performance boost was the result of carefully optimizing the software implementation.
Meeting Exascale Mandates
There are two ways to look at the current energy-efficiency, according to Green500 list custodian and Virginia Tech professor Wu Feng. In the positive sense, the trajectory set by the current list is on track to establish a 20-40 MW exascale supercomputer in the 2022 timeframe. But if the community were still targeting an exascale machine for the original 2018 timeframe, it would be a supercomputer on the order of 150 megawatts, extrapolating from this list.
Wu gets the sense that an increasing number of people are not worried about power because vendors are saying they will hit this goal. He warns that this could be a false sense of security. “Even if we do make the 20 MW target, we’ve bought ourselves four to six years by slipping the exascale target from 2018 to 2022-2024,” he says. “So that false sense of security that ‘power is not going to be a problem’ — well it is a problem if we were still looking at 2018.”
At the same time, Wu emphasizes that the DOE is taking exascale innovation very seriously and is investing research dollars into FastForward and related projects where vendors, such as AMD, Intel and NVIDIA, are focused on maximizing performance per watt.
“So these two aspects – the lengthened runway and the investment in energy-efficient technologies – have gone a long way toward addressing the thermal power envelop of these extreme-scale supercomputers,” he says.
“Even a 150 MW system, extrapolated out from today’s list, is historically quite good given that six years ago when you extrapolated out the power envelope linearly, we were well over a gigawatt for an exascale machine. So there has been an order of magnitude improvement. Combined with this additional runway, the expectation is that we’ll cover that last order of magnitude to get to the 20 MW target,” he says.
The HPC community’s enhanced focus on power and energy as first-order design constraints is also reflected in the relative diversity of the Green500 list compared with the TOP500 list. While the last several TOP500 lists have seen very little turnover in the top ten spots, nearly every Green500 experience some churn at the top. While the latest TOP500 saw only one new entrant in the top ten, the Green500 welcomed four machines into the top tier. To be fair, of course, the barrier to entry is less prohibitive since these Green500 champions can be (and tend to be) smaller TOP500 systems that cost orders of magnitude less to build than their more FLOPS-dominant list-mates.
The fall of the thermal power envelope has mainly been driven by heterogeneous supercomputers, powered by manycore chips from vendors like AMD, NVIDIA, Intel and now PEZY, says Wu. The top 32 supercomputers on the current list made use of accelerators, compared to the top 23 supercomputers on the November 2014 edition of the list, a 40 percent increase. Wu says the heterogeneous design gives the user or developer different kinds of silicon brains that can be matched with the task at hand to get performance and energy efficiencies out of executing tasks that weren’t there before.
One of those silicon “brains” that was conspicuously absent from the upper rankings of the Green500 was ARM. Wu notes that it is a challenge for these ARM licensers to make the jump from embedded mobile to the HPC server space, but he thinks ARM’s day is coming.
Wu is a champion of low-power computing, going back to his creation of the Green Destiny supercomputer in 2002. That machine had a 3.2-kilowatt power budget (the equivalent of two hairdryers) and a 101-gigaflop Linpack rating that would have placed it at number 393 of the 2002 TOP500 list. Wu says he understands the skepticism around ARM, but he expects that 64-bit ARM will begin to populate the Green500 within the next year or two.
“It could become the Toyota Prius of supercomputing,” he shares, noting that “it will have its place even it it isn’t at the upper echelons of the next exascale supercomputer.”