In June, HPCwire highlighted the new MN-3 supercomputer: a 1.6 Linpack petaflops system delivering 21.1 gigaflops per watt of power, making it the most energy-efficient supercomputer in the world – at least, according to the latest Green500 list, the Top500’s energy-conscious cousin. The system was built by Preferred Networks, a Japanese AI startup that used its in-house MN-Core accelerator to help deliver the MN-3’s record-breaking efficiency. Collaborating with Preferred Networks was modular system manufacturer Supermicro, which detailed the hardware and processes behind the chart-topping green giant in a recent report.
As Supermicro tells it, Preferred Networks was facing challenges on two fronts: first, the need for a much more powerful system to solve its clients’ deep learning problems; and second, the exorbitant operating costs of the system they were envisioning. “With increasing power costs, a large system of the size PFN was going to need, the operating costs of both the power and associated cooling would exceed the budget that was allocated,” Supermicro wrote. “Therefore, the energy efficiency of the new solution would have to be designed into the system, and not become an afterthought.”
Preferred Networks turned to partnerships to help resolve these problems. First, they worked with researchers at Kobe University to develop the MN-Core accelerator, specializing it for deep learning training processes and optimizing it for energy efficiency. After successfully benchmarking the MN-Core above one teraflop per watt in testing, the developers turned to the rest of the system – and that’s where Supermicro entered the picture.
On a visit to Japan, Clay Chen – general manager of global business development at Supermicro – sat down with Preferred Networks to hear what they needed.
“At first I was asking them, you know, what type of GPU they are using,” Chen said in an interview with HPCwire. “They say, ‘oh, no, we’re not using any type – we’re going to develop our own GPU.’ And that was quite fascinating to me.”
Preferred Networks selected Supermicro for the daunting task: fitting four MN-Core boards, two Intel Xeon Platinum CPUs, up to 6TB of DDR4 memory and Intel Optane persistent memory modules in a single box without sacrificing the energy efficiency of the system.
Supermicro based its design on one of its preexisting GPU server models that was designed to house multiple GPUs (or other accelerators) and high-speed interconnects. Working with Preferred Networks’ engineers, Supermicro ran simulations to determine the optimal chassis design and component arrangement to ensure that the MN-Core accelerators would be sufficiently cooled and efficiency could be retained.
Somewhat surprisingly, the custom server is entirely fan-cooled. “Our concept is: if we can design something with fan cooling, why would we want to use liquid cooling?” Chen said. “Because essentially, all the heat being pulled out from the liquid is going to cool somewhere. When you take the heat outside the box, you still need to cool the liquid with a fan.”
The end result, a customized Supermicro server just for Preferred Networks, is pictured below.
The server’s four MN-Core boards are connected to PCIe x16 slots on a Supermicro motherboard and to the MN-Core Direct Connect board that enables high-speed communication between the MN-Core boards.
These custom servers – each 7U high – were then rack-mounted into what would become the MN-3 supercomputer: 48 servers, four interconnect nodes and five 100GbE switches. In total, the system’s 2,080 CPU cores, delivering 1,621 Linpack teraflops of performance, required just 77 kW of power for the Top500 benchmarking run. This efficiency level is just 15 percent short of the 40-megawatt limit targeted by planned exascale systems like Aurora, Frontier and El Capitan.
“We are very pleased to have partnered with Supermicro, who worked with us very closely to build MN-3, which was recognized as the world’s most energy-efficient supercomputer,” said Yusuke Doi, VP of computing infrastructure at Preferred Networks. “We can deliver outstanding performance while using a fraction of the power that was previously required for such a large supercomputer.”