From the Editor | Main Blog Index
March 04, 2010
As many industry watchers have noted, including me, the next few months will see the introduction of a raft of new x86 server chips that offer between 6 and 12 cores. Although both Intel and AMD have already fielded 6-core processors ("Dunnington" for Intel and "Istanbul" for AMD), the new Xeon and Opterons will set some new expectations in the x86 server chip arena.
For one thing, the "multi" in multicore is about to become a lot more meaningful. Instead of simply doubling the core count, which was the model in the past, when the industry moved en masse from uni-core to dual-core to quad-core, we're now going to see processors with 2, 4, 6, 8, and 12 cores filling different niches in the server space.
This month, Intel is expected to roll out its 6-core Westmere EP processor aimed at dual-socket platforms. For 4-socket systems and above, the 8-core Nehalem EX is expected before mid-year. Intel is also planning on a faster clocked 6-core Nehalem EX variant, which is targeted especially for the HPC market. Meanwhile, AMD is set to launch its 8- and 12-core Magny Cours Opterons at about the same time as the first Westmere chips launch. Magny-Cours, though, will support both 2- and 4-socket servers.
Given that diversity, server makers will have a lot more choice on how they want to balance FLOPs with memory capacity, memory bandwidth, and I/O in different product niches. This is especially true for HPC, where the memory wall problem is particularly prominent. In fact, in this post-quad-core era it's worth remembering the 2009 Sandia study that suggested performance would drop for certain data-intensive apps when the underlying platform moved beyond eight cores:
A Sandia team simulated key algorithms for deriving knowledge from large data sets. The simulations show a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added.
That suggests that the most likely consequence of core proliferation will be greater emphasis on memory capacity and bandwidth per node. As processors have added performance, the memory bytes per flop and bytes/sec per flop ratios have been dropping, leaving a lot of unused performance on the chip. To counter that, we're starting to see a trend back to big-node, shared memory systems. Frankly, most of the commercial solutions for x86-based systems are more focused on increasing memory capacity, rather than bandwidth, given that the latter is far more difficult to accomplish without design help at the CPU level. Nevertheless, increasing memory can indirectly help the bandwidth issue, since aggregate access increases as you add more RAM.
The move to bigger memory machines has already begun. NCSA is getting reading to install Ember, a large-scale shared memory SGI UV Altix super. That machine is going to be used for computational chemistry as well as solid and fluid dynamics research. ScaleMP, which uses its vSMP technology to concoct virtual SMPs, has had a number of wins lately, include the Gordon cluster at the San Diego Supercomputing Center. Although that machine is best known for its use of flash memory, the vSMP technology is used to build "supernodes" that can access as much as 2 TB of RAM. Relative newcomer 3Leaf Systems recently announced Florida State University will deploy the company's "fabric computing" technology to aggregate multiple Opteron-based nodes into virtual shared memory servers. Finally, although not aimed at HPC, IBM just unveiled its eX5 servers, which allows users to expand RAM to 1.5 TB per two-socket machine.
The burgeoning core count also raises a sort of existential question for a lot of HPC users. In a Linux Magazine article, Douglas Eadline noted that since more than half of HPC apps use 32 cores or fewer (according to both IDC research and a Cluster Monkey survey), it's possible low-end HPC work will migrate from clusters to single nodes. In that case, multi-socketed workstations could end up replacing traditional clusters.
Well, the sweet spot of such workstations is still dual-socket systems (as it is for servers), so we'll really have to wait until 16-core chips hit the streets next year to answer that question. On the other hand, considering that the latest GPUs from AMD and NVIDIA (especially the upcoming Fermi processors) can take the place of multiple high-end CPUs for a range of HPC workloads, we may not need dozens of x86 cores to push a lot of low-end supercomputing onto the desktop. In fact, the presence of general-purpose GPUs makes the use of double-digit core counts somewhat superfluous in these cases, unless someone can figure out a way match up graphics processors with CPU cores.
One final thought. When considering how multicore CPUs are distorting system balance, it's tempting to get hung up on efficiency metrics and maximizing hardware resources. But as John Gustafson has reminded us: "System balance is not about bytes per flops/s, mass storage/RAM, or any such ratios. It never has been. System balance means adding something to the design such that the percent improvement in value (performance, reliability, or whatever) is greater than the percent improvement in the total cost of ownership. A system is perfectly balanced when no further such improvements are possible."
Posted by Michael Feldman - March 04, 2010 @ 4:42 PM, Pacific Standard Time
There are 2 discussion items posted.
Multi core deployment becomes a memory game
Submitted by
truly64
on Mar 5, 2010 @ 8:27 AM EST
The big issue for our HPC deployments is that researchers are looking for at least 2GB of RAM per core, with 4GB on the horizon. As core count goes up, we have to move to higher density DIMMs, like the 4GB DIMMs, which are > 2x the cost of 2GB DIMMs on a per GB basis.
If you do the math, it is actually less expensive to buy dual quadcore CPUs and load the system with lower cost, less dense memory than to move to the higher core count CPUs.
And, like Sandia, we have found that application performance suffers as core count increases. We think this is due to more cores accessing a single interconnect, but don't have definitive analysis as yet.
That being said, our next buy may well be single socket MagnyCour 8 core CPU systems which maximize DIMM slots.
Post #1
16 core
Submitted by
gretta
on Sep 1, 2010 @ 5:52 PM EDT
Really good article Michael, I am looking forward to the 16 core chips next year. Lets hope they are ahead of schedule and get it out early in 2011.
things to do in pa
Post #2
|
Join the Discussion |
![]()
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
NVIDIA has introduced its first Kepler-generation GPU product for high performance computing, and revealed some of the inner working of the new architecture. The announcement took place at the kickoff of the company's GPU Technology Conference taking place this week in San Jose, California.
Read more...
Intel Corp. has launched three new families of Xeon processors, joining the Xeon E5-2600 series the chipmaker introduced in March. These latest chips span the entire market for the Xeon line, from four- and two-socket servers, down to entry-level workstations and microservers. A number of HPC server makers, including SGI, Dell, and Appro announced updated hardware based on the new silicon.
Read more...
With the fastest supercomputers on the planet sporting multi-megawatt appetites, green HPC has become all the rage. The IBM Blue Gene/Q machine is currently number one in energy-efficient flops, but a new FPGA-like technology brought to market by semiconductor startup eASIC is providing an even greener computing solution. And one HPC project in Japan, known as GRAPE, is using the chips to power its newest supercomputer.
Read more...
May 16, 2012 |
Chief scientist discusses memory stacks, interconnects, and US technology leadership.
Read more...
May 15, 2012 |
GPU maker conjures up visualization technology for virtual desktops.
Read more...
May 14, 2012 |
Pessimistic predictions about technology have a poor track record, according to 451's John Barr.
Read more...
May 10, 2012 |
DRAM manufacturers gear up for DDR4.
Read more...
May 09, 2012 |
Steven Chu discusses the role of supercomputing in energy research.
Read more...