The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
August 24, 2007
In a series of three articles, the High-End Crusader ponders the future impact of industry's ever-evolving many-core technology on both parallel computing and heterogeneous processing. In the first article, he explains the meltdown of monolithic, monothreaded, out-of-order scalar processors as vehicles for delivering steadily increasing performance.
The high-end computing community needs to reconceptualize both parallel computing and heterogeneous processing in tandem with industry's unsteady progression from multicore infancy to many-core adolescence to the full maturity of nanocore.
Nanocore is a notional scaling range for the number of cores in a chip multiprocessor. The low end of the range is the inflection point, sometime after 64 cores, when wholly innovative microarchitectural strategies are required to scale further. The high end of the range is the time, perhaps in 2014, when we can integrate 1,024 cores on a single processor die.
The conventional justification of heterogeneous processing, which has much to recommend it, is that distinct processor types, with distinct core execution models, can produce higher execution efficiencies on modern applications comprised of disparate subcomputations with disparate algorithmic characteristics and thus disparate architectural needs. Standard metrics of execution efficiency include operations per second per dollar and operations per second per watt. A slogan that nicely captures this line of thinking is, make the common parallelism case fast and low power, tightly integrate all parallelism cases so that there is minimum overhead in "switching" among them, and scale to the heavens, which goes nicely with execution efficiency.
Note that "tight integration" is first and foremost tight integration among the disparate cores on a heterogeneous processor die.
On occasion, provision is made for locality. For example, there is an admirable special-purpose machine in development that optimizes short-range communication at the expense of long-range communication, targeting strongly localizable applications. In contrast, a general-purpose heterogeneous-processing system achieves high execution efficiencies on a broad range of applications, no matter what their parallelism or locality characteristics.
Another observation is that the vast cloud of systems collectively known as "high-performance computers" has bifurcated into two effectively disjoint sets, depending on whether government funding is driving performance regimes well above any level the private sector would consider a market sweet spot.
This sweet spot is constantly evolving; it might be anywhere from a few tens to a few hundreds of sustained TFs/s in 2011. Pseudo-commercial super clusters are red herrings that obscure this clear picture.
Be this as it may, only a handful of systems are scaling ambitiously. The reference machine is, of course, Japan's Keisoku Keisanki, which is running in the Riken lab today at 2 PFs/s. This is a machine whose heroic _useful_ scalability is due to brilliantly engineered integration of heterogeneous processors. Here is the break down of its 10 PFs/s. First, 1.5 PFs/s comes from (presumably out-of-order) scalar processors. Then, there is another 0.5 PFs/s coming from vector processors. Finally, the heavy lifting is done by "Grape-7", which is an array of identical special-purpose devices (SPD) specially optimized for finite-element analysis. This SPD array makes up the remaining
8 PFs/s.
Sustained performance will be remarkably high on each of 21 preselected applications.
Page: 1 of 7(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html