Visit additional Tabor Communication Publications
July 31, 2008
Come gather round people wherever you roam
And admit that the waters around you have grown
And accept it that soon you'll be drenched to the bone
If your time to you is worth saving
Then you'd better start swimming or you'll sink like a stone
For the times, they are a changing
-- Bob Dylan
These are interesting times for the microprocessor industry. At the same time the multicore revolution is happening, we're also seeing the rise of data parallel architectures. Yes, vector computing is back, but this time, it's not just for nerds.
In a recent Linux Magazine article by Doug Eadline on processor trends, he wrote that mainstream computing is splitting into two architectural paths: general-purpose multicore CPUs, and data parallel engines -- what Eadline calls parallel/predictable computing units. The latter include GPUs, the Cell processor, and the future Larrabee processors. To that we could also add FPGAs and custom ASICs like the ClearSpeed devices.
General-purpose computing is great for software like word processors and operating systems, where the nature of the task is unpredictable from one moment to the next, and data-intensive operations are absent. This type of code is strewn with a lot of "if-then-else" statements to handle fine-grained complexity. On the other hand, predictable computing is well-suited to multimedia apps and most types of HPC, where high levels of data parallelism can be exploited. If your code contains a lot of "for" statements that are processing big chunks of tables, you probably could benefit from data parallelism.
The reason CPUs have dominated the computing landscape for so long is that all applications need some sort of program control, and any data-heavy for-loops could always be implemented serially. Today though, a high-end computer game wouldn't be practical without a GPU or game processor. And as visual and audio media become commonplace on the Internet and in mobile devices, clients and servers will need to be equipped with chips that can process large arrays of data in real time. Data parallelism will become a requirement practically everywhere.
The same goes for high performance computing. For example, with GPU-equipped systems, we're seeing HPC codes like seismic analysis or molecular dynamics accelerated by up to two orders of magnitude compared to CPU-based systems. The extra computing power is opening up HPC applications to a much larger audience. At the high-end, the Cell-based Roadrunner has put the petaflop supercomputer on the map, and NVIDIA GPU-accelerated supers are on the drawing board.
The rise of multimedia applications and the growth of HPC means that data parallel processors are targeted for some of the hottest markets. True, it will be multimedia that drives volume, but HPC will help to pull these processors up the performance curve as it has done with the Cell processor. Every chip vendor is aware of this. The processor realignment explains why AMD bought ATI, why NVIDIA is expanding its lineup for the mobile and HPC markets, why Intel is making a foray into high-end visual computing with Larrabee, and why IBM is quickly constructing an ecosystem around the Cell processor.
As TG Daily's Theo Valich pointed out, it appears that for the first time GPUs will be implemented on a smaller manufacturing technology than CPUs. According to him, both NVIDIA and AMD will use Taiwan Semiconductor Manufacturing Company fabs to start churning out GPU silicon on the 40nm process node in early 2009. Intel CPUs are currently at 45nm and their move to 32nm is unlikely to happen until the second half of 2009. The five nanometer edge for GPUs would be mostly symbolic, but as Valich notes, AMD and NVIDIA will probably make a big deal about it.
So where is this leading? Eadline believes the optimal platform for highly parallel (predictable) applications will turn out to be a single general-purpose core hooked up to some number of parallel processing engines. The Cell processor, with a PowerPC core surrounded by eight SPEs is the current example. Larrabee will likely be a more tightly integrated version of this, with a wide SIMD unit integrated into each core -- more like a vector-enhanced manycore CPU. AMD and NVIDIA are dabbling with CPU-GPU integrated chips, but the first generation is aimed at the low end (mobile clients). There are no public plans to integrate a CPU core into AMD's FireStream or NVIDIA Tesla HPC platforms.
The discrete CPU will be around for a while, though. There is plenty of non-technical software that just needs a handful of cores -- or even just one -- to run at peak efficiency. Vanilla desktop systems and virtualized enterprise servers, equipped with multicore CPUs, will handle these apps just fine. It's the cutting-edge applications that will require these new massively parallel architectures.
In August, there are a bunch of conferences that will feature some of the latest goings on in the data parallel realm. SIGGRAPH, the HOT CHIPS symposium, the Intel Developer Forum, and NVIDIA's NVISION 08 conference will have a lot to say about the new processor landscape and how it's being shaped by emerging applications. I'm going to be following the events over the next few weeks and give you my take on the developments.
Posted by Michael Feldman - July 30, 2008 @ 9:00 PM, Pacific Daylight Time
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.