The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
March 02, 2007
To explore the important new paper on the challenges of parallelism, "The View from Berkeley," HPCwire talked with NERSC computer scientist John Shalf and David Patterson, professor of computer science at UC-Berkeley. Shalf and Patterson are among the co-authors of "The View from Berkeley."
HPCwire: To what extent has the HPC community learned how to exploit hardware and software parallelism during the past 20 years? Where do things stand today?
Shalf: When the HPC community migrated from vector to parallel machines in the early 90s, the transition was extremely difficult for the first five years. Now, 80 percent to 90 percent of codes have made that transition to MPPs [massively parallel processors] and the community has developed a substantial portfolio of parallel numerical algorithms.
As things stand today, the HPC community has become accustomed to modest increases in system concurrency over the past 15 years. For that matter, the desktop community has become accustomed to virtually no parallelism. As clock frequencies stall, future performance improvements will depend on accelerating the pace of parallelism -- doubling the concurrency of computer systems of all scales every 18 months! The assumptions on which the current generation of codes are founded will break very rapidly under this situation. The software changes necessary to ride this wave of exponentially increasing parallelism will be at least as substantial as the transition from vector to MPP systems.
Patterson: The industry is already betting on multicore for future improvements in computing performance. To use a football analogy, the computing industry has already thrown a "Hail-Mary" pass with the first round of multicore designs. The ball is in the air, but nobody is running yet. That's where things stand today.
HPCwire: Your report is called the "View from Berkeley." What is the view from Berkeley about the challenges of future parallel architectures?
Patterson: The overarching challenge is that we need to find ways to make it easy to write programs that run efficiently on manycore systems. If we don't succeed, then the future of the IT industry looks clouded, because the industry will then face diminishing returns on the value of buying new computers with more cores.
We also offer opinions on good paths to pursue. First, RISC, not CISC. Assuming we can program them, the most efficient hardware in FLOPS per watt and FLOPS per dollar is simple single-issue pipelined cores. Second, manycore, not multicore. We think the target should be hundreds to thousands of simple cores per socket, not four or eight. Third, autotuners, not compilers. We think generating parallel code by dynamically exploring the options heuristically on that computer is a more promising path than producing code only via conventional compilers. Finally, human-centric, not machine-centric programming models. Psychological research on how people design and why people make mistakes shapes HCI [Human–computer interaction] research, but not programming models. We think we should rely on experimental research from psychology to guide future parallel programming models.
Shalf: Underlying all of the arguments laid out in the report is the belief that manycore chip design is our ultimate path forward for future computing systems. We aren't so much wild-eyed advocates for the multicore approach as we are realists. I think Kurt Keutzer, one of the lead authors on the report, sums this up best when he says "This shift toward increasing parallelism is not a triumphant stride forward based on breakthroughs in novel software and architectures for parallelism; instead, this plunge into parallelism is actually a retreat from even greater challenges that thwart efficient silicon implementation of traditional uniprocessor architectures." If you don't accept Kurt's statement at face value, the report provides substantial arguments to turn your opinion around. If you accept that the future of computing is manycore, then the Berkeley View explores the ramifications of that assumption in detail.
Convergence toward manycore for mainstream chips is already apparent. There is the new NVIDIA CUDA GPU, which is moving from the highly specialized pixel and vertex processors of the previous generation of GPUs to 128 more general purpose cores. The recently announced Intel teraflop chip employs 80 simplified cores to hit one teraflop double-precision on a chip that consumes less than 70 watts. Cisco has moved away from its typical ASIC designs towards employing 192 Tensilica cores in the Metro chip, which is the heart of its new high-end CRS-1 router. The common thread is that using hundreds of simpler cores is more power-efficient than attempting to push the clock rate on a few complex cores.
Page: 1 of 6(Digg, Technorati, more)
Jul 06 | The Register | NSA looks to tap into cheap electrical power for new supercomputers. Read more...
Jul 06 | TechRadar | Breaking the exaflops barrier will help keep the nation's nuclear weapons safe. And that's just the start. Read more...
Jul 01 | GenomeWeb Daily News | The popularity of cloud computing in the life sciences community was on full display at April's Bio-IT World conference. Read more...
Jul 01 | Linux Magazine | How can getting to the ocean help with HPC computing? Read more...
Jun 29 | GCN.com | Agency issues RFI for "Ubiquitous High Performance Computing" systems. Read more...
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.