Eurotech Hive Takes The Sting Out Of Density

By Timothy Prickett Morgan

November 21, 2014

Back at the International Supercomputing Conference in June, supercomputer maker Eurotech dropped some hints about its future water-cooled Aurora systems that would employ a mix of ARM processors and Nvidia Tesla GPU accelerators in a dense form. At the SC14 conference this week, these machines have now been officially launched as the Aurora Hive systems, and it turns out that the systems will also allow customers to build massively parallel machines based on Intel Xeon processors and Xeon Phi coprocessors.

The Hive systems use a modular enclosure that that is based on a cubic shape rather than a hexagonal one, but the concept of densely stacking compute elements while isolating them from each other, as a beehive does, holds true. The system crams up to 128 nodes (which are called bricks) into a single rack – 64 nodes in the front and another 64 nodes in the back, which is something you can do when you use water cooling on the components of the nodes because you do not have to worry about airflow from cold to hot aisles through each rack.eurotech-aurora-hive-cross-section

The Hive system makes use of a second generation of direct hot water cooling from the Aurora line, which Fabio Gallo, Eurotech HPC business unit managing director, tells HPCwire can cool a system with 50 degree Celsius (122 degrees Fahrenheit) inlet water temperature. The new water cooling is lighter and more compact, allowing for more compute and cooling to be crammed into the same space. The water distribution system is built right into the Aurora Hive rack, and there are dripless connectors for inlet cold (relatively speaking) and outlet hot water coming off each node. Being able to take the heat away quickly and efficiently is vital because a fully configured Hive rack draws 166 kilowatts of juice.

“You can free cool this machine nearly anywhere on earth,” says Gallo. By Eurotech’s math, customers using the Aurora Hive should be able to attain a power usage effectiveness of 1.05, which is about as good as the hyperscale datacenter operators are getting. (PUE, as this metric is abbreviated, is the ratio of the power consumed by a datacenter divided by the power consumed by the compute, storage, and network components of the datacenter. Getting as close as possible to 1 is the goal.)

eurotech-hive-block-exposedThe Hive nodes are 3U high, and you can put them into a rack four across and sixteen high. (Each node is 130 mm high by 105 mm deep by 325 mm deep.) Each node has a system board that includes risers for a compute module and five coprocessor modules; this system board also includes a PCI-Express 3.0 switch from PLX Technology (now part of Avago Technologies) that links the compute and coprocessor elements to each other. The PCI-Express switch also has hooks out to network adapters, in this case a two-port FDR InfiniBand adapter from Mellanox Technologies. All of the PCI-Express slots have the full bandwidth of an x16 slot, which means Nvidia Tesla GPU and Intel Xeon Phi coprocessors can find a place.

Eurotech’s first Hive system will have a CPU compute element that is based on Intel’s “Haswell” Xeon E3-1200 v3 processors. This family of chips has four cores and clock speeds that range from 3.1 GHz to 3.7 GHz in standard versions. The Intel E3-1200 v3 compute node has 32 GB of memory welded onto it for low clearance and also has a 256 GB half-height 1.8-inch solid state disk drive. You can use any E3-1200 v3 chip that has a thermal design point of 84 watts or lower.

The compute brick allows for up to four coprocessors to be fitted with cold plates for sucking the heat off their components and linked to each one of the cores over the PCI-Express switch and into the PCI-Express controllers on the E3-1200 processors. Gallo tells HPCwire that it will ship the Xeon E3-1200 plus Xeon Phi configuration in a few weeks to initial customers, and that a few months after that the combination of the Xeon E3 processor and Nvidia’s Tesla K40 coprocessor will be supported. The Xeon Phi 7120X is rated at 1.2 teraflops doing double precision floating point math, while the Tesla K40 card has a base performance of 1.43 teraflops that can rise to 1.66 teraflops with GPU Boost overclocking turned on. That works out to 614 teraflops per rack with Xeon Phis and 732 teraflops per rack with the Tesla K40s (not counting the extra performance from GPU Boost).

eurotech-hive-rack_openBack in June at ISC, Eurotech was talking up the Hive system (which did not yet have that name) by saying that it would be delivering a variant of the system that would marry a 64-bit ARM processor from Applied Micro with Tesla GPU coprocessors, and you might have gotten the impression that this would come out first. While Applied Micro is shipping its “Storm” X-Gene 1 chip now, it is readying the much-better “Shadowcat” X-Gene 2 processor, which has been sampling since August. This chip will support the RDMA over Converged Ethernet (RoCE) protocol over its integrated Ethernet network interface cards, simplifying the components that go into an ARM server node. The X-Gene 1 and 2 chips have two 10 Gb/sec Ethernet ports on the die, and these can be hooked eight into adapter ports. That, in theory, leaves more room for other peripherals in the complex. The plan is to ship the X-Gene 2 as the ARM option for the CPU side of the hybrid node, along with the Tesla K40 cards as coprocessors, sometime around the second quarter of 2015.

Incidentally, Eurotech is able to get its hands on a modified Tesla K40 card with its thermal plates modified so it fits into the super-skinny Hive module. The new Tesla K80 coprocessor card, announced this week at SC14, will be a bit tricky to add to the Aurora Hive system, explains Gallo, because this dual-GPU card has some of its power connectors across the top of the card. This does not work with the very tight tolerances in the Hive module, which are necessitated by the thermal conduction plates. With the Tesla K80 offering a base 1.87 teraflops of double precision math with a GPU Boost of up to 2.91 teraflops, you can bet some customers will want this. Gallo says that there is enough thermal capacity to pull the heat off this 300 watt part, if the connectors can be sorted. Being able to double the flops in the box is a pretty strong motivator to solve this engineering problem.

Generally speaking, the X86 processor option plus either the Xeon Phi or Tesla GPU accelerators draws about 1,500 watts per node, which works out to around 5 gigaflops per watt. The top machines on the Green500 ranking of supercomputers are in the range of 4 gigaflops per watt.

Gallo is tight lipped about what other processing components it might add to the Aurora Hive system, but obviously next year’s “Knights Landing” Xeon Phi, which will be sold as a standalone processor as well as a PCI-Express coprocessor, will slide right into this system. At 3 teraflops of double-precision floating point performance, and with the ability to put in five cards, this will be a radical increase in the math capabilities. And for dense-packed, CPU only workloads that used low-speed Ethernet, Eurotech could make Hive bricks that are just based on Xeon E3 or various ARM processors which sport their own networking on the chip. If you take out the network card, that leaves room for six CPU-only compute cards per module, or 768 processors per rack. Another option would be to add cards that have flash drives with the high-speed, low-latency NVM Express protocol linking into that PCI-Express switch. You could also swap out some of the flash drives and put in GPU cards for visualization to do visualization in the same nodes where the data is stored. Eurotech has lots of options with the Aurora Hive architecture, and that is so by design.

But initially at least, Eurotech is going after the workloads that have been accelerated. “There are markets where accelerated application have become the norm instead of an exotic thing,” says Gallo. “Geosciences, particularly reverse time migration reservoir analysis, is a good example. In general, signal processing will be interesting on this system, as well be machine learning, analytics, and some computer-aided engineering tools that have been modified for accelerators.”

The Aurora Hive comes preconfigured with the CentOS 6.X variant of Linux and support from Eurotech for this distribution, but customers can deploy other Linux operating systems on the machine as needed. Scientific Linux, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Canonical Ubuntu Server are all supported. The Aurora software stack includes support for Intel Cluster Studio, Nvidia CUDA, MPSS, and the GCC compilers as well as the Intel MPI, Open MPI, and MVAPICH2 communication libraries.

Pricing for the Aurora Hive system was not available, and the question is what kind of premium can Eurotech charge for density and hot water cooling. The combination of the two should allow Eurotech to command a premium for its systems over plain vanilla clusters based on rack or blade servers, but it is a question as to how much. The market will decide.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

At GTC: Nvidia Expands Scope of Its AI and Datacenter Ecosystem

March 19, 2019

In the high-stakes race to provide the AI life-cycle solution of choice, three of the biggest horses in the field are IBM, Intel and Nvidia. While the latter is only a fraction of the size of its two bigger rivals, and h Read more…

By Doug Black

AWS to Offer Nvidia’s T4 GPUs for AI Inferencing

March 19, 2019

The AI inference market is booming, prompting well-known hyperscaler and Nvidia partner Amazon Web Services to offer a new cloud instance that addresses the growing cost of scaling inference. The new “G4” instances... Read more…

By George Leopold

Nvidia Debuts Clara AI Toolkit with Pre-Trained Models for Radiology Use

March 19, 2019

AI’s push into healthcare got a boost yesterday with Nvidia’s release of the Clara Deploy AI toolkit which includes 13 pre-trained models for use in radiology. Clara, you may recall, is Nvidia’s biomedical platform Read more…

By John Russell

HPE Extreme Performance Solutions

HPE and Intel® Omni-Path Architecture: How to Power a Cloud

Learn how HPE and Intel® Omni-Path Architecture provide critical infrastructure for leading Nordic HPC provider’s HPCFLOW cloud service.

powercloud_blog.jpgFor decades, HPE has been at the forefront of high-performance computing, and we’ve powered some of the fastest and most robust supercomputers in the world. Read more…

IBM Accelerated Insights

The Spark That Ignited A New World of Real-Time Analytics

High Performance Computing has always been about Big Data. It’s not uncommon for research datasets to contain millions of files and many terabytes, even petabytes of data, or more. Read more…

DARPA, NSF Seek Real-Time ML Processor

March 18, 2019

A new U.S. research initiative seeks to develop a processor capable of real-time learning while operating with the “efficiency of the human brain.” The National Science Foundation (NSF) and the Defense Advanced Research Projects Agency jointly announced a “Real Time Machine Learning” project on March 15 soliciting industry proposals for “foundational breakthroughs” in hardware required to “build systems that respond and adapt in real time.” Read more…

By George Leopold

At GTC: Nvidia Expands Scope of Its AI and Datacenter Ecosystem

March 19, 2019

In the high-stakes race to provide the AI life-cycle solution of choice, three of the biggest horses in the field are IBM, Intel and Nvidia. While the latter is Read more…

By Doug Black

Nvidia Debuts Clara AI Toolkit with Pre-Trained Models for Radiology Use

March 19, 2019

AI’s push into healthcare got a boost yesterday with Nvidia’s release of the Clara Deploy AI toolkit which includes 13 pre-trained models for use in radiolo Read more…

By John Russell

It’s Official: Aurora on Track to Be First U.S. Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Oil and Gas Supercloud Clears Out Remaining Knights Landing Inventory: All 38,000 Wafers

March 13, 2019

The McCloud HPC service being built by Australia’s DownUnder GeoSolutions (DUG) outside Houston is set to become the largest oil and gas cloud in the world th Read more…

By Tiffany Trader

Quick Take: Trump’s 2020 Budget Spares DoE-funded HPC but Slams NSF and NIH

March 12, 2019

U.S. President Donald Trump’s 2020 budget request, released yesterday, proposes deep cuts in many science programs but seems to spare HPC funding by the Depar Read more…

By John Russell

Nvidia Wins Mellanox Stakes for $6.9 Billion

March 11, 2019

The long-rumored acquisition of Mellanox came to fruition this morning with GPU chipmaker Nvidia’s announcement that it has purchased the high-performance net Read more…

By Doug Black

Optalysys Rolls Commercial Optical Processor

March 7, 2019

Optalysys, Ltd., a U.K. company seeking to advance it optical co-processor technology, moved a step closer this week with the unveiling of what it claims is th Read more…

By George Leopold

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Contract Signed for New Finnish Supercomputer

December 13, 2018

After the official contract signing yesterday, configuration details were made public for the new BullSequana system that the Finnish IT Center for Science (CSC Read more…

By Tiffany Trader

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

HPC Reflections and (Mostly Hopeful) Predictions

December 19, 2018

So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through Read more…

By John Russell

Arm Unveils Neoverse N1 Platform with up to 128-Cores

February 20, 2019

Following on its Neoverse roadmap announcement last October, Arm today revealed its next-gen Neoverse microarchitecture with compute and throughput-optimized si Read more…

By Tiffany Trader

Move Over Lustre & Spectrum Scale – Here Comes BeeGFS?

November 26, 2018

Is BeeGFS – the parallel file system with European roots – on a path to compete with Lustre and Spectrum Scale worldwide in HPC environments? Frank Herold Read more…

By John Russell

France to Deploy AI-Focused Supercomputer: Jean Zay

January 22, 2019

HPE announced today that it won the contract to build a supercomputer that will drive France’s AI and HPC efforts. The computer will be part of GENCI, the Fre Read more…

By Tiffany Trader

Microsoft to Buy Mellanox?

December 20, 2018

Networking equipment powerhouse Mellanox could be an acquisition target by Microsoft, according to a published report in an Israeli financial publication. Microsoft has reportedly gone so far as to engage Goldman Sachs to handle negotiations with Mellanox. Read more…

By Doug Black

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This