February 15, 2013
Supercomputing applications in the enterprise are driven by what Stillwater CEO Theodore Omtzigt considers "valuable economic activities of the business." For example, FedEx and Exxon require extensive logistical modeling and optimization to reduce their operational costs by billions of dollars. As such, those two companies are compelled to dedicate billions of dollars into investing in supercomputing and need not worry as much about cloud-based computing.
However, according to Omtzigt, while institutions that only have millions to invest in HPC would benefit in equal proportion (about 10 percent) from supercomputing applications, the overall pay-off would not be worth it. Further, cloud-based applications have too high a latency to solve FedEx-like complex logistical problems.
Stillwater's answer was to combine the two, creating essentially a cloud-based supercomputer, or "on-demand supercomputer."
"We were asked to design, construct, and deploy an on-demand supercomputing service for a Chinese cloud vendor," Omtzigt said of the inception of Stillwater's service. "The idea was to build an interconnected set of supercomputer centers in China, and offer a multi-tenant on-demand service for high-value, high-touch applications, such as logistics, digital content creation, and engineering design and optimization."
The below diagram shows the system's topology.

The design relies on a large network of interconnects that link storage servers to compute nodes. "So half the quad can fall away, and the system would still have full connectivity between storage and computes," said Omtzigt. The idea is to build several levels of redundancy into the system such that operations can continue when certain servers take longer than expected with their job.
Keeping costs down would be important to maintain such an infrastructure, something alleviated by keeping the InfiniBand-based storage system near the computing. "To lower the cost of the system, storage was designed around IB-based storage servers that plugged into the same infrastructure as the compute nodes…This is less expensive than designing a separate NAS storage subsystem, and it gives the infrastructure flexibility to build high-performance storage solutions," said Omtzigt.
Virtualization would have been another way to balance demand and keep costs down, but the designers opted for bare metal provisioning to avoid the I/O latency hit.
According to Omtzigt, the result, when tested, was able to maintain 18 teraflops with a peak of 19.2 at a cost of $3.6 million. The Chinese vendor turns around and rents out time on the system: a full dual-socket server with 64GB of memory goes for about $5/hour. A content firm in Beijing is using 100 servers at the cost of $20,000 a month.
Ease-of-use, pay-per-use and and lower setup and operating costs are compelling traits. The notion of a redundant, bare-metal supercomputing service will be something to watch in the months and years to come.
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Read more...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Read more...
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.