Visit additional Tabor Communication Publications
October 12, 2011
Appro is doing a brisk business over at the Department of Energy. After winning the DOE's second Tri-Lab Linux Capacity Cluster contact back in June, Appro has been tapped once again to provide Los Alamos National Laboratory (LANL) with yet another high performance computing cluster. The new Mustang supercomputer, installed there last month, will give the lab another 353 teraflops of number crunching capacity.
The deal is worth about $10 million to the Appro, which seems to be a popular HPC vendor these days with the crowd at Los Alamos. The company has been doing business there since 2005 and apparently hasn't worn out its welcome yet.
LANL is one of the three DOE labs operated by the Nuclear Security Administration (NNSA) to support the computational needs of the agency's stockpile stewardship program, which maintains the country's nuclear weaponry by way of virtual simulations. While much of that work is classified, the new Mustang system is slated for unclassified codes at the lab. Those include applications involving ocean, wildfire, plasma physics, materials and nuclear energy research.
The system will also support the Climate, Ocean, and Sea Ice Modeling (COSIM) project, a part of the climate modeling program in the DOE's Office of Science. The project's mission is to develop, tests and applies ocean and ice models in support of DOE Climate Change Research. COSIM represents one of the larger user groups in the Turquoise network.
Other large user communities include the NNSA's Advanced Simulation and Computing (ASC) program and the Consortium for Advanced Simulation of Light Water Reactors (CASL). CASL applies existing modeling and simulation capabilities and develops advanced capabilities to create an environment for predictive simulation of light water reactors.
Like most HPC codes, all of these applications are hungry for compute. According to Andrew White, LANL's Deputy Associate Director for Theory, Simulation and Computation, Los Alamos saw the need for a bigger machine for scaling out many of those simulations, plus a basic need of providing more cycles to lab researchers in an unclassified environment.
Mustang will join five other machines in what is called LANL's Turquoise Open Collaborative Network. The network encompasses HPC systems that provide an unclassified computing environment for LANL researchers as well as external collaborators from other DOE labs and elsewhere.
Prior to Mustang, the aggregate performance in Turquoise was a little less than 300 teraflops. The new Mustang machine will more than double that performance and do so with single cluster. The next largest machine in the network is Cerrillos, a 152-teraflop IBM BladeCenter system that is equipped Cell processors (PowerXCell 8i) for extra acceleration. As such, Cerrillos can sub as a smaller version of the famous Roadrunner supercomputer, also housed at Los Alamos.
Mustang is not nearly as exotic. The new system is Appro's straight-up Xtreme-X platform, outfitted with AMD's 12-core 6100 "Magny-Cours"Opteron processors. Each of the system's 1,600 nodes houses two such CPUs and includes 64GB of memory. The nodes are lashed together with Mellanox QDR InfiniBand in single-rail fashion.
The September deployment schedule was unfortunate inasmuch as they were forced to go with the current 6100 Opterons, instead of the newer 6200 Series "Interlagos" Opterons , which are just ramping into production now. (In fact, they were shipping to selected customers in September.) Those Bulldozer-class CPUs would have bumped the core count from 38,400 cores to 51,200 and likely delivered over a half a petaflop of peak performance.
Likewise, Mustang was also just a little early for FDR InfiniBand. Opting for that technology would have pushed the deployment into 2012, says White. FDR yields a raw throughput of 56 Gbps to QDR's 40 Gbps, but the bandwidth difference is actually larger because of the more efficient data link encoding in the new InfiniBand technology.
The new cluster should have a good run at LANL, though. "We plan on a four-year lifetime for Mustang," explains White. "Sometimes operational needs require us to keep a system such as this for as long as five years, but four years is better from many standpoints: parts reliability and availability and performance and energy efficiency."
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.