Commodity Processor Chaos or Convergence?

By Michael Feldman

December 1, 2006

In this week's issue of HPCwire, Scott Michel's feature article — GPGPU Computing And the Heterogeneous Multi-Core Future — does a nice job of discussing how commodity accelerators like GPUs and the Cell BE processor are helping to set the stage for heterogeneous multi-core computing. In doing so he provides some context for the emerging model of heterogeneous processing. He also talks about some of the important challenges that are being confronted, including software compatibility, compiler technologies and language environments. Scott hosted a general-purpose GPU computing tutorial workshop at last month's Supercomputing conference and was kind enough to share his thoughts on this evolving topic.

Reading Scott's article got me to thinking about the “disruptive” nature of new technologies. Incompatible architectures including multi-core x86 processors, the Cell BE processor, and GPU co-processors from ATI (now AMD) and NVIDIA are all tempting targets, waiting to be exploited for their performance prowess. The adoption of these new processors is making for exciting times in the world of high performance computing, but from the software developer's point of view, it seems chaotic.

For these new processors to be successful, the average programmer must have access to a familiar development environment. This is especially important for architectures such as GPUs and the Cell, which up until recently were only programmable through low-level software environments for game developers and graphics coders. However in one sense, all these architectures are converging; they are all going parallel. So the techniques used to program a GPU or Cell are similar to those used to program a standard homogeneous multi-core processor.

Two companies, PeakStream and RapidMind, are taking advantage of this commonality and each has built a software platform that targets these parallel architectures. PeakStream introduced their product back in September. RapidMind's offering is currently in beta, but seems to be close to a release date. I recently talked with the founders of both companies, Matthew Papakipos at PeakStream and Michael McCool at RapidMind, to get a sense of why these new parallel architectures are being mainstreamed now and where this trend is taking us.

Matthew Papakipos, PeakStream's founder and chief technology officer, has been intimately involved with GPUs for almost 10 years. He ran the GPU architecture group at NVIDIA, from 1997 to 2003, the period when GPUs grew from simple graphics engines to general-purpose processors. This parallels the rise of graphics processing in the computer and electronic games industry. Papakipos told me that when he started at NVIDIA in 1997, there were 70 people. When he left there were 2500.

At the beginning, the GPU logic was all in hardware. The programmability was added later to get more generalized graphics functionality. Papakipos said that during the early years, NVIDIA was being inundated with requests for new features from all the game developers, like new fog modes, new color interpolation or bump mapping. Microsoft was leading the charge by demanding that games be more interesting looking.

“We realized it would be easier to make the chips programmable rather than give them all the crazy features they were asking for,” said Papakipos. “We were going down this path of adding all these bell and knobs and whistles that individual developers were asking for to differentiate the way their games looked.”

So making the devices programmable enabled the game developers to create their own visual effects via software. In 2000, NVIDIA introduced its first programmable chip, the NV20, which ended up in the first Xbox. ATI was going down the same path as NVIDIA with their GPU device. Over the years the graphics engines evolved to become more powerful and even more general-purpose.

“It's not like we set out to make a chip for high performance computing,” explained Papakipos, “but after adding enough features, we had a pretty general-purpose processor. And suddenly it became possible to do some interesting things with it in HPC.”

By 2003, people started to realized that GPUs might serve as commodity replacements for proprietary floating point vector processors, representing a real opportunity to bring these devices into the HPC world. Subsidized by legions of game enthusiasts, supercomputing hardware became “almost free.”

“The spark that set this off was a bunch of folks at Stanford who did some really good research in late 2004, on getting a real application to run on these GPUs,” said Papakipos. “That was the first time anybody had taken a real HPC application and gotten it to run on these graphic processors.”

The application was called ClawHMMER, which performs protein sequence matching. That work was done by Pat Hanrahan and was demonstrated over a year ago at SC05. A flurry of other applications were ported by the graphics research community. But Papakipos realized that only graphics programmers could figure out how to get the devices to do anything.

“There was a software gap and that's what led us to create PeakStream,” said Papakipos.

The PeakStream platform provides HPC-type APIs (similar to the Intel Math Kernel Library or the MATLAB interfaces) and developer tools (debuggers and profilers) for a C/C++ programming environment. Some real compiler work was required to make that happen. The API is the front door to a virtual machine that provides the JIT (just-in-time) compiler. The virtual machine retargets the code to the particular processor the user is running on.

RapidMind software platform has a similar model. Like the PeakStream offering, it provides C++ programmers a high-level interface to data parallelism. RapidMind's runtime compiler generates the appropriate machine code for the target processor type.

Like Papakipos, Michael McCool, co-founder and chief scientist at RapidMind realized that non-graphics programmers would require a more familiar development environment to be able to apply GPUs and the Cell to a broader set of applications. McCool, a professor at the Computer Graphics Lab at the University of Waterloo, has done research into advanced programming interfaces for the graphics processors. This research, funded by the CITO, resulted in a programming system called Sh. The Sh system enabled developers to use the GPU co-processors in a PC for both graphics and general-purpose computing applications. In 2004, McCool and Stefanus Du Toit co-founded Serious Hack Inc. to commercialize this technology. Since then, the company has been renamed from Serious Hack to RapidMind.

And like his PeakStream counterpart, McCool also sees GPUs evolving towards greater and greater generality. With each new generation he sees them looking more like vector or stream co-processors.

“GPUs were actually capable of doing all this stuff a year ago but it wasn't until the X1900 and the 7000 series GPUs, from ATI and NVIDIA respectively, that there was enough of a performance leap to make it worthwhile,” explained McCool. “You needed that order of magnitude. Also, it took a year for the tools and for the applications to be written at the commercial level.”

The evolution of the GPU over the past five years has been dramatic and should continue to be so for the foreseeable future. Not only greater performance will be available, but new capabilities as well. The addition of double precision floating point hardware to the GPU (recently announced by NVIDIA for a 2007 device) will be especially important for HPC applications that require 64-bit FP accuracy, which should further accelerate industry adoption. It's still unclear how quickly the commodity markets will drive GPUs into the double precision realm. So far, game developers have been very resourceful with single precision.

“But there are other limitations in the GPU,” noted McCool. “For example, you have floating point but no integers, which turns out to be a real pain in the neck. So in RTT's ray tracer we had to worry about floating point round-off error in our pointers. The next generation of GPUs will make those kind of weird problems go away.”

Compared to a GPU, which is more akin to a co-processor, the Cell processor represents a more complex architecture, consisting of a PowerPC core with eight synergistic processing elements (SPEs) and a local memory store. The Cell design lends itself to more complex computations than might be feasible with a GPU.

This week, Gianni De Fabritiis, a researcher with the Computational Biochemistry and Biophysics Lab (GRIB-IMIM/UPF) in the Barcelona Biomedical Research Park published a white paper (http://arxiv.org/PS_cache/physics/pdf/0611/0611201.pdf) describing a molecular dynamics simulation application that achieved 30 gigaflops sustained performance on a Cell BE, representing an order of magnitude improvement when compared to a standard scalar CPU. The only notable downside was the effort required to change the application's software model. Concludes Fabritiis:

“The cost of this effort cannot be underestimated, but the performance obtainable compared to a traditional processor is about 20 times faster for the realistic case of molecular dynamics of biomolecules. Similar results are also possible for other computing intensive scientific and technological problems, such as computational fluid dynamics, systems biology and Monte Carlo methods for finance.”

He continues:

“New multi-core standard processors will need to show that they can reach similar performance levels at the same cost. The implications of this technology for science are also important. Without a doubt it expands the frontier of scientific computing while lowering the cost of entry in terms of the computational infrastructure required to run molecular based software.”

There's a notion that GPUs, the Cell and x86 architectures are actually converging. PeakStream's Papakipos thinks the Cell BE and AMD's future “Fusion” (CPU-GPU) processor are part of a larger phenomenon that will transform general-purpose computing. He envisions CPUs becoming more GPU-like, and processors evolving into architectures that include a large number of cores, distributed memory, NUMA (Non-Uniform Memory Access) and SIMD (Single Instruction Multiple Data) hardware. Even the 80-core prototype Intel talked up at the Intel Developer Forum this September follows this same general pattern.

“There's a convergence starting to happen between multi-core x86 processors, GPUs and the Cell processor,” said Papakipos. “If you look at those three processors today, they all look pretty different. But if you look forward a few years, they're all going to the same place.”

—–

As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at [email protected].

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Hyperion: AI-driven HPC Industry Continues to Push Growth Projections

November 21, 2019

Three major forces – AI, cloud and exascale – are combining to raise the HPC industry to heights exceeding expectations. According to market study results released this week by Hyperion Research at SC19 in Denver, Read more…

By Doug Black

At SC19: Bespoke Supercomputing for Climate and Weather

November 20, 2019

Weather and climate applications are some of the most important uses of HPC – a good model can save lives, as well as billions of dollars. But many weather and climate models struggle to run efficiently in their HPC en Read more…

By Oliver Peckham

Microsoft, Nvidia Launch Cloud HPC Service

November 20, 2019

Nvidia and Microsoft have joined forces to offer a cloud HPC capability based on the GPU vendor’s V100 Tensor Core chips linked via an InfiniBand network scaling up to 800 graphics processors. The partners announced Read more…

By George Leopold

Hazra Retiring from Intel Data Center Group, Successor Not Known

November 20, 2019

Rajeeb Hazra, corporate VP of Intel’s Data Center Group and GM for the Enterprise and Government Group, is retiring after more than 24 years at the company. At this writing, his successor is unknown. An earlier story on... Read more…

By Doug Black

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU-accelerated computing. In recent years, AI has joined the s Read more…

By John Russell

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Data Management – The Key to a Successful AI Project

 

Five characteristics of an awesome AI data infrastructure

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19!]

AI is powered by data

While neural networks seem to get all the glory, data is the unsung hero of AI projects – data lies at the heart of everything from model training to tuning to selection to validation. Read more…

SC19 Student Cluster Competition: Know Your Teams

November 19, 2019

I’m typing this live from Denver, the location of the 2019 Student Cluster Competition… and, oh yeah, the annual SC conference too. The attendance this year should be north of 13,000 people, with the majority attende Read more…

By Dan Olds

Hyperion: AI-driven HPC Industry Continues to Push Growth Projections

November 21, 2019

Three major forces – AI, cloud and exascale – are combining to raise the HPC industry to heights exceeding expectations. According to market study results r Read more…

By Doug Black

At SC19: Bespoke Supercomputing for Climate and Weather

November 20, 2019

Weather and climate applications are some of the most important uses of HPC – a good model can save lives, as well as billions of dollars. But many weather an Read more…

By Oliver Peckham

Hazra Retiring from Intel Data Center Group, Successor Not Known

November 20, 2019

Rajeeb Hazra, corporate VP of Intel’s Data Center Group and GM for the Enterprise and Government Group, is retiring after more than 24 years at the company. At this writing, his successor is unknown. An earlier story on... Read more…

By Doug Black

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU Read more…

By John Russell

Top500: US Maintains Performance Lead; Arm Tops Green500

November 18, 2019

The 54th Top500, revealed today at SC19, is a familiar list: the U.S. Summit (ORNL) and Sierra (LLNL) machines, offering 148.6 and 94.6 petaflops respectively, Read more…

By Tiffany Trader

ScaleMatrix and Nvidia Launch ‘Deploy Anywhere’ DGX HPC and AI in a Controlled Enclosure

November 18, 2019

HPC and AI in a phone booth: ScaleMatrix and Nvidia announced today at the SC19 conference in Denver a joint offering that puts up to 13 petaflops of Nvidia DGX Read more…

By Doug Black

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutt Read more…

By Tiffany Trader

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This