Aspen
NetApp
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

Gravity Attracts a GigE HPC Cluster


Not all supercomputing rides on InfiniBand or proprietary interconnects. For technical applications that decompose neatly into loosely-coupled threads, a big cluster with vanilla Gigabit Ethernet does just fine. The persistence of Ethernet on the TOP500 attests to the interconnect's continued viability on big clusters. On the latest June list, GigE is being used on 284 of the top systems, which is actually slightly up from the 273 recorded in November 2007. But as clusters scale out into hundreds or even thousands of nodes, Ethernet infrastructure can grow into a complex burden of cables and multi-layer switches.

The top Ethernet system on the TOP500 list -- at number 58 -- is the new ATLAS cluster at the Max Planck Institute for Gravitational Physics in Germany. Installed earlier this year, the ATLAS system is being used in the Institute's quest to detect gravitational waves -- one of the big prizes remaining in physics. A gravitational wave is a fluctuation in the curvature of space-time that is theorized to occur as the result of cosmic events in the early universe, or more recently, from the extreme gravitational fields generated by neutron stars and black holes. First predicted by Albert Einstein in 1917 as part of his General Theory of Relativity, gravitational waves have never been directly measured. Through the use of large arrays of laser interferometers deployed in the U.S., Italy and Germany, it is hoped that evidence of the elusive wave will be discovered.

Because the effect of gravitational waves are so subtle here on Earth, very large quantities of data must be collected, and enormous computational power must be brought to bear to prove their existence. It is hoped that the ATLAS system will provide a platform to help move this effort forward. The 32.8 teraflop (Linpack) machine is made up of 1,342 single-socket compute nodes, occupying 32 racks.

Each ATLAS compute server has a 2.4 GHz Intel quad-core Xeon processor and communicates with the rest of the system via a 1 Gigabit link to a top-of-rack Woven TRX 100 Ethernet switch, which acts as a GigE aggregator with four 10 GigE uplinks. The uplinks funnel the server data to the 144-port 10 GigE Woven EFX 1000 core switch. Since the configuration is not over-subscribed, non-blocking Ethernet communication is provided for each server.

Because of the amount of data involved in gravitational wave analysis, the ATLAS compute servers are hooked up to 1.3 petabytes of external storage. The storage consists of 42 separate file nodes, 30 of which are GigE-linked servers connected via another TRX 100; the other 12 are 10 GigE-connected Sun Microsystems "Thumper" file servers directly hooked into the EFX 1000 core switch. An additional 500 GB of direct-connected storage is provided on each compute node. The CPU on any server can access the local disk storage on any other server as well as the central storage nodes.

Unlike more tightly-coupled MPI codes, analysis of gravitational wave data is an embarrassingly parallel application that lends itself to a server farm type set up. Each node is involved in very data-intensive computations, but node-to-node communication is minimal. Most of the data communication takes place between the compute nodes and the storage.

Because of the highly parallel nature of the code and the reliance on low latency I/O communications, the more granular, single-socket servers were the best fit for the application. Bruce Allen, director of the Max Planck Institute for Gravitational Physics in Hannover, Germany, who led the specification ATLAS system, determined that even at the computational scale of the ATLAS system, a Gigabit Ethernet interconnect was the logical choice. "Something like InfiniBand or Myrinet would have been overkill for this kind of application," he said.

What he really liked about Woven's solution was how well designed and how cost-effective it was, and also how easily it scaled up to the 1,000-plus-node cluster he had in mind. Since the EFX 1000 incorporates 144 10 GigE ports, this single core switch, along with the TRX edge switches, supported compute and storage communication for the entire cluster. Another attraction of the Woven technology is its ability to dynamically determine the optimal path for the data. The vSCALE chip in the switch is constantly monitoring latency of the active and alternative paths in the Ethernet fabric. If it finds an alternative path with lower latency, the hardware redirects traffic to take advantage of the faster route. This is especially advantageous when all the nodes are accessing both central storage and local disks on the other nodes. According to Allen, the Woven hardware was better designed and more flexible than any other Ethernet solution they looked at.

"What is remarkable about the ATLAS cluster is that we were able to take the lead very cost-effectively with a creative combination of more processors at lower clock rates and a higher Ethernet switching efficiency," explained Allen in a press release on Tuesday. "Woven's 10 Gigabit Ethernet Fabric switch is able to deliver sustained performance at an impressive 64 percent of the theoretical peak. The HPC Linpack experts we consulted tell us that they have never seen such a high level of Ethernet efficiency on such a large cluster. Without the Woven switch, ATLAS would not be the world's fastest Ethernet cluster. It's that simple."

Allen has also helped develop an even larger system that is being used to process the gravitational wave data. This one is also Ethernet-based, but communicates at sub-GigE speeds. The Einstein@Home project is a distributed grid of personal computers, and like its ATLAS sibling, is used to crunch some of the same laser interferometer data collected from around the world.

According the Allen, the current Einstein grid represents over 150 teraflops of computing power and adds about 2,000 new personal computers each day. Like the larger Folding@Home and SETI@Home projects, Einstein@Home relies on the kindness of strangers to donate spare PC cycles for the advancement of science. And while not as efficient as the Institute's ATLAS supercomputer, the grid offers a lot of extra capacity for wave calculations. Between ATLAS and Einstein@Home, another mystery of the universe may finally be revealed.

Sponsored Links

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013

May 09, 2013

May 08, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In

Supermicro

Short Takes

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

HPC and the True Cost of Cloud

May 08, 2013 | For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC Xyratex

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events