On Wednesday Intel shifted its Tera-scale Computing Research Program into second gear by demonstrating a 48-core x86 processor. The company is intending to use the new chip as a research platform for the purpose of lighting a fire under manycore computing.
According to Intel, the new chip boasts 1.3 billion transistors and is built on 45nm CMOS technology. Its distinction is that it contains the largest number of Intel Architecture (IA) cores ever assembled on a single microprocessor. As such, it represents the sequel to Intel’s 2007 “Polaris” 80-core prototype that was based on simple floating point units. While the latter chip was said to reach 2 teraflops, the company is not talking about performance for the 48-core version.
It’s worth mentioning that Intel is not blazing completely new territory here. Tilera already offers 32- and 64-core general-purpose processors (albeit non-x86) and previewed a 100-core version in October. Those chips are aimed at digital multimedia applications, networking gear, wireless infrastructure, and cloud computing.
Intel’s 48-core offering is not intended for commercial use at all. Rather it will be used to help software researchers figure out how real applications can scale from dozens to thousands of cores. It can also be used as a testbed to experiment with new parallel computing models and applications. Intel plans to distribute at least a 100 of the experimental chips to commercial and academic researchers. Since the new chip incorporates “fully functional” IA (32-bit) cores, existing software should port with relative ease.
Intel is labeling the device a “Single-chip Cloud Computer” (SCC), presumably to emphasize the processor’s resemblance to a shrink-wrapped datacenter. It’s probably more accurate to simply call it a cluster-on-a-chip, considering it is essentially a bunch of cores hooked together by an on-chip network. Specifically, the processor contains 24 dual-core tiles, arranged in a two-dimensional 6-by-4 layout. Main memory is accessed via four on-chip DDR3 memory controllers. Each tile comes with its own router that connects the tiles to the network fabric.
Probably the most important feature of the network is its hardware support for message passing, which should provide a very high-performance environment for many cluster applications. Alternatively, a software-managed shared-memory model may be employed by codes that communicate via global data.
Each core contains its own L2 cache. But unlike most modern CPU designs, the SCC doesn’t offer hardware cache coherence. Instead, it offloads that task to the software, which has to coordinate reads and writes between all the caches. In this case, Intel opted to trade off programming ease with hardware simplicity.
Fine-grained power management allows the processor to scale from a low of 25 watts up to a maximum of 125 watts. Each tile can run at a different frequency, while each row of four tiles can be run at a different voltage. There’s also voltage and frequency controls for the network and the memory controllers. Like cache coherence, all power management is controlled via software.
The potential application area is actually much larger than the examples cited above. Given that CPUs with dozens of cores will eventually be a mainstay across most IT market segments, it is anticipated that new parallel computing applications will emerge as manycore makes its way from servers into the desktop and mobile arena. At least that’s what Intel is hoping.
Details about the new microprocessor will be formally presented at the International Solid-State Circuits Conference on February 8 in San Francisco.