Visit additional Tabor Communication Publications
April 21, 2011
As Intel prepares to roll out its Many Integrated Core (MIC) technology for commercial production in 2012, it has managed to entice a major US supercomputing center to start porting some of its science codes to the new architecture. The Texas Advanced Computing Center (TACC) announced it has teamed up with the chipmaker and begun porting a handful of research applications to the pre-production "Knights Ferry" MIC processor. Later this year, TACC will build a cluster of such chips for further development, with the intent to deploy a system based on the commercial "Knights Corner" MIC processor when Intel starts production.
MIC represents Intel's entry into the HPC processor accelerator sweepstakes, as the company attempts to perform an end-run around GPU computing. Mainly thanks to NVIDIA, over the last few years GPU computing, aka GPGPU, has become a mainstream HPC solution across workstations, clusters and supercomputers. They rely on specialized programming environments, like CUDA and OpenCL, to develop software on those platforms.
As suggested by its name, MIC is essentially an x86 processor, with more cores (but simpler ones) than a standard x86 CPU, an extra-wide SIMD unit for heavy duty vector math, and four-way SMT threading. As such, it's meant to speed up codes that can exploit much higher levels of parallelization than can be had on standard x86 parts.
Knights Ferry is Intel's development implementation spun out of the chipmaker's abandoned Larrabee processor effort for visual computing. The chip sports 32 IA cores and runs at 1.2 GHz. Since each core supports a four-way SMP (as opposed to the two-way HyperThreading on Xeons), each chip can manage up to 128 threads in parallel. Memory-wise, Knights Ferry has 8 MB of cache and 1 to 2 GB of GPU-flavored GDDR5 DRAM. Like its current GPGPU competition, Knights Ferry is meant to be hooked up to a PCIe bus, acting as a co-processor to a standard x86 CPU.
Knights Corner will be Intel's first commercial version of MIC, will have upwards of 50 cores per chip, and will be implemented on the company's 22nm process technology. Although no official date has been announced for the commercial launch, according to a presentation by Intel research engineer Pradeep Dubey at the recent 2011 Open Fabrics International Workshop in Monterey, Knights Corner is slated for release sometime in the second half of 2012.
At this point, TACC is using the MIC software development kit (SDK), employing a Knights Ferry chip attached to a single machine. According to TACC's deputy director Dan Stanzione, they are planning to build a "relatively small" cluster of Knights Ferry-equipped nodes to test codes in a distributed computing environment before the end of the year.
On Thursday, I spoke with Stanzione, who was very upbeat about the new architecture, noting that the x86 compatibility is a big deal for TeraGrid researchers. In aggregate, they have a massive investment in their science codes, numbering in the hundreds.
"This is a way to get a dramatically better power per operation without having to throw out everything we know about software," he said, adding, "I'm really excited about this as a path forward. I think it has the potential to be a real game-changer."
One nice feature of MIC programming is that it inherently supports OpenMP, a popular parallel computing model for shared memory environments. And since Intel's HPC tool chain -- Parallel Studio and Cluster Studio -- has been extended to the MIC architecture, the programmer can even stay in the same development environment for both its Xeon and MIC work -- which, of course, Intel would like very much.
The result is that OpenMP code written for four-core or six-core x86 CPUs, like some of the ones TACC has started porting, should move rather easily to a 32-core MIC co-processor. "Getting the codes to run the first time is pretty simple," Stanzione said, adding that when they move to the MIC cluster, they'll have to figure out how to layer an MPI distributed memory model on top of that.
According to him, they've already ported a bunch of benchmark codes and have started with the applications. One is a bio-modeling app, which attempts to detect epistatic interactions (how genes modify each other to express a phenotype) across a corn genome. The code was thousands of lines long, but because it was parallelized via OpenMP, it moved to MIC with minimal restructuring.
Although TACC has committed resources to the MIC effort, Stanzione said they are evaluating hardware and software accelerator approaches across the spectrum, most notably using CUDA and OpenCL on GPUs. (TACC's Longhorn supercomputer is currently the center's largest GPU platform, sporting 512 NVIDIA Tesla processors.) Although it's too early to compare performance across specific applications, it's already apparent that porting is much simpler with Intel's offering.
"Moving a code to MIC might involve sitting down and adding a couple of lines of directives that takes a few minutes," explained Stanzione. "Moving a code to a GPU is a project."
Although measuring performance is still a work in progress, the early results on scaling appear to be encouraging. According to Stanzione, doubling the number of MIC cores has roughly doubled the performance on some of the initial codes. They expect to be able to say a lot more about performance when they get the Knights Corner commercial parts.
From Intel's point of view, getting TACC to sign on to MIC development is a big boost for its manycore effort. Assuming the porting goes as planned, the chipmaker will be able to point to a nice set of proof points based on real-world HPC applications. According to John Hengeveld, Intel's director of technical compute marketing for its datacenter group, they'll be able to incorporate TACC's experience into the upcoming delivery of Knights Corner parts and software. "Having a partner that is helping us work on issues of scalability and optimization is really quite valuable," he explained.
Although TACC is the first big HPC organization with a committed roadmap for MIC development, they won't be the last. Intel currently has about 100 MIC developers scattered around, and according to Hengeveld, they'll be announcing some bigger collaborations in the months ahead. And as we get closer to MIC's commercial release, the news surrounding the new architecture should start to pick up. "We'll be talking a lot more about this at ISC," promised Hengeveld.
Posted by Michael Feldman - April 21, 2011 @ 8:22 PM, Pacific Daylight Time
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.