The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
December 02, 2009
On Wednesday Intel shifted its Tera-scale Computing Research Program into second gear by demonstrating a 48-core x86 processor. The company is intending to use the new chip as a research platform for the purpose of lighting a fire under manycore computing.
According to Intel, the new chip boasts 1.3 billion transistors and is built on 45nm CMOS technology. It's distinction is that it contains the largest number of Intel Architecture (IA) cores ever assembled on a single microprocessor. As such, it represents the sequel to Intel's 2007 "Polaris" 80-core prototype that was based on simple floating point units. While the latter chip was said to reach 2 teraflops, the company is not talking about performance for the 48-core version.
It's worth mentioning that Intel is not blazing completely new territory here. Tilera already offers 32- and 64-core general-purpose processors (albeit non-x86) and previewed a 100-core version in October. Those chips are aimed at digital multimedia applications, networking gear, wireless infrastructure, and cloud computing.
Intel's 48-core offering is not intended for commercial use at all. Rather it will be used to help software researchers figure out how real applications can scale from dozens to thousands of cores. It can also be used as a testbed to experiment with new parallel computing models and applications. Intel plans to distribute at least a 100 of the experimental chips to commercial and academic researchers. Since the new chip incorporates "fully functional" IA (32-bit) cores, existing software should be able to ported relatively easily.
Intel is labeling the device a "Single-chip Cloud Computer" (SCC), presumably to emphasize the processor's resemblance to a shrink-wrapped datacenter. It's probably more accurate to simply call it a cluster-on-a-chip, considering it is essentially a bunch of cores hooked together by an on-chip network. Specifically, the processor contains 24 dual-core tiles, arranged in a two-dimensional 6-by-4 layout. Main memory is accessed via four on-chip DDR3 memory controllers. Each tile comes with its own router that connects the tiles to the network fabric.

Probably the most important feature of the network is its hardware support for message passing, which should provide a very high-performance environment for many cluster applications. Alternatively, a software-managed shared-memory model may be employed by codes that communicate via global data.
Each core contains its own L2 cache. But unlike most modern CPU designs, the SCC doesn't offer hardware cache coherence. Instead, it offloads that task to the software, which has to coordinate reads and writes between all the caches. In this case, Intel opted to trade off programming ease with hardware simplicity.
Fine-grained power management allows the processor to scale from a low of 25 watts up to a maximum of 125 watts. Each tile can run at a different frequency, while each row of four tiles can be run at a different voltage. There's also voltage and frequency controls for the network and the memory controllers. Like cache coherence, all power management is controlled via software.
During the Wednesday unveiling, Intel CTO Justin Rattner demonstrated a number of programs running on the new chip, including a Black-Scholes financial analytics code, a Javascript-driven physics cloth modeling application, linear algebra and fluid dynamics codes, and a Hadoop Web search application. In general, these applications are the same ones you might find on a typical cluster in datacenter. Their utility here is that they all represent software than scales exceedingly well across processing elements.
The potential application area is actually much larger than the examples cited above. Given that CPUs with dozens of cores will eventually be a mainstay across most IT market segments, it is anticipated that new parallel computing applications will emerge as manycore makes its way from servers into the desktop and mobile arena. At least that's what Intel is hoping.
Details about the new microprocessor will be formally presented at the International Solid-State Circuits Conference on February 8 in San Francisco.
(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html