With no end in sight for multicore CPUs and manycore GPUs, and supercomputers with hundreds of thousands of processors being envisioned, the parallel programming problem looms large indeed. IDC’er Steve Conway, writing for Scientific Computing, reminds us just how bad the problem has become:
To date, three real-world applications have broken the petaflop barrier (10^15 calculations/second), all on the Cray “Jaguar” supercomputer at the Department of Energy’s Oak Ridge National Laboratory. A slightly larger number have surpassed 100 teraflops (10^12 calculations/second), mostly on IBM and Cray systems, and a couple of dozen additional scientific codes are being groomed for future petascale performance. All of these applications are inherently parallel enough to be laboriously decomposed — sliced and diced — for mapping onto highly parallel computers.
His point being that high performance computing applications, in general, are remarkable underachievers, given the top-end hardware available today. According to IDC surveys, over half of the applications don’t scale beyond 8 processors, and a scant 6 percent can use more than 128 processors. Beside the disconnect between growing hardware and software parallelism, Conway also points to a couple of other problems afflicting today’s HPC systems, namely slower processor clock speeds and the growing imbalance between processor cores and bandwidth (memory and I/O). These attributes also need to be taken into account when devising software for modern HPC machines.
Not surprisingly, Conway thinks HPC software will have to be rewritten — as disruptive a prospect as that is — to take advantage of the current crop of multi-teraflop and petaflop systems, much less the future multi-petaflop and exaflop machines. Being a good glass-half-full analyst, he also sees opportunity, noting that those who are able to create the next generation of software tools and applications that can keep pace with the hardware will find themselves at the top of the HPC heap.