The Weekly Top Five features the five biggest HPC stories of the week, condensed for your reading pleasure. This week, we cover the NC State effort to overcome the memory limitations of multicore chips; the sale of the first-ever commercial quantum computing system; Cray’s first GPU-accelerated machine; speedier machine learning algorithms; and the connection between shrinking budgets and increased reliance on modeling and simulation.
Research Technique Addresses Multicore Memory Limitations
A new technique developed by researchers at North Carolina State University promises to boost multicore chip performance from between 10 to 40 percent. The new approach is two-pronged, using a combination of bandwidth allocation and “prefetching” strategies.
One of the limitations to multicore performance is the memory problem. Each core needs to access off-chip data, but there is only so much bandwidth available. With the proliferation of multicore designs, the data pathway is all the more congested. The NC State researchers developed a system of bandwidth allocation based on the fact that some cores require more access to offchip data than others. Implementing an on-chip memory store (cache-based) allows the chip to prefetch data. When prefetching is used in an intelligent as-needed basis, performance is further enhanced.
With boths sets of criteria working in tandem, “researchers were able to boost multicore chip performance by 40 percent, compared to multicore chips that do not prefetch data, and by 10 percent over multicore chips that always prefetch data,” the release explained.
First-Ever Commercial Quantum Computing System Sold
Vancouver-based research outfit D-Wave Systems, Inc. began generating buzz in 2007 when the company announced it had built the first commercially-viable quantum computer. The claim was difficult to verify and received a fair amount of skepticism.
Now four years later, D-Wave has announced the first sale of a quantum computing system, known as D-Wave One, to Lockheed Martin Corporation. As part of a multi-year contract, “Lockheed Martin and D-Wave will collaborate to realize the benefits of a computing platform based upon a quantum annealing processor, as applied to some of Lockheed Martin’s most challenging computation problems.” D-Wave will also be providing Lockheed with maintenance and related services.
The D-Wave One relies on a technique called quantum annealing, which provides the computational framework for a quantum processor. It was also the subject of an article published in the May 12 edition of Nature. The computer’s 128-qubit processor, known as Rainier, relies on quantum mechanics to tackle the most complex computational problems. While Lockheed Martin’s exact interest in the system was not specified, suitable applications include financial risk analysis, object recognition and classification, bioinformatics, cryptology and more.
A Physics World article cited expert collaboration regarding the system’s authenticity. MIT’s William Oliver, although not part of the research team, went on record as saying: “This is the first time that the D-Wave system has been shown to exhibit quantum mechanical behaviour.” Oliver characterized the development as “a technical achievement and an important first step.”
Further coverage of this historic event, including an interview with D-Wave co-founder and CTO Geordie Rose, is available here.
Cray Debuts GPU-CPU Supercomputer
The newest Cray supercomputing system, called the Cray XK6, relies on processor technology from AMD and NVIDIA to achieve a true hybrid design that offers up to 50 petaflops of compute power. Launched at the 2011 Cray User Group (CUG) meeting in Fairbanks, Alaska, the supercomputer employs a combination of AMD Opteron 6200 Series processors (code-named “Interlagos”) and NVIDIA Tesla 20-Series GPUs, and provides users with the option to run applications with either scalar or accelerator components.
The XK6 is the first Cray system to implement the accelerative power of GPU computing, and Barry Bolding, vice president of Cray’s product division, highlights this fact:
“Cray has a long history of working with accelerators in our vector technologies. We are leveraging this expertise to create a scalable hybrid supercomputer — and the associated first-generation of a unified x86/GPU programming environment — that will allow the system to more productively meet the scientific challenges of today and tomorrow.”
Cray already has its first customer; the Swiss National Supercomputing Centre (CSCS) in Manno, Switzerland, is upgrading its Cray XE6m system, nicknamed “Piz Palu,” to a multi-cabinet Cray XK6 supercomputer.
The Cray XK6, which is scheduled for release in the second half of 2011, will be available in both single and multi-cabinet configurations and scales from tens of compute nodes to tens of thousands of compute nodes. Upgrade paths will be possible for the Cray XT4, Cray XT5, Cray XT6 and Cray XE6 systems.
For additional insight into this Cray first, check out our feature coverage.
PSC, HP Labs Speed Machine Learning Algorithm with GPUs
Researchers from the Pittsburgh Supercomputing Center (PSC) and HP Labs have figured out how to speed the process of key machine-learning algorithms using the power of GPU computing. Specifically, the team has achieved nearly 10 time speed-ups with GPUs versus CPU-only code, and more than 1,000 times versus an implementation in an unspecified high-level language. Machine learning is a branch of artificial intelligence that “enables computers to process and learn from vast amounts of empirical data through algorithms that can recognize complex patterns and make intelligent decisions based on them.”
The application the research team is working with is called k-means clustering, popular in data analysis and “one of the most frequently used clustering methods in machine learning,” according to William Cohen, professor of machine learning at Carnegie Mellon University.
Ren Wu, principal investigator of the CUDA Research Center at HP Labs, developed the GPU-accelerated cluster algorithms. Wu then teamed up with PSC scientific specialist Joel Welling to test the algorithms on a real-world problem, which used data from Google’s “Books N-gram” dataset. This type of N-gram problem is common in natural-language processing. The researchers clustered the entire dataset, with more than 15 million data points and 1,000 dimensions, in less than nine seconds. This kind of breakthrough will allow future research to explore the use of more complex algorithms in tandem with k-means clustering.
Lean Budget Increases Government Reliance on Modeling and Simulation
The Institute for Defense & Government Advancement (IDGA) put out a brief statement last week, suggesting a link between declining budgets and a growing demand modeling & simulation (M&S) tools.
Last week, the Army and Department of Defense (DoD) awarded a $2.5 billion contract to Science Applications International Corporation (SAIC) for a combination of planning, modeling, simulation and training solutions. According to the IDGA, “this contract signifies the growing need for simulation training to prepare troops for combat. Despite budget constraints, Modeling and Simulation (M&S) is expanding as technological improvements develop. M&S is the more viable and cost-effective option for tomorrow’s armed forces.”
The IDGA also announced that its 2nd Annual Modeling and Simulation Summit will explore the latest technological advancements and look at the lessons to be learned from recent efforts. This event will have a focus on military strategies for M&S, such as Irregular Warfare and Counter-IED training.