Phil Hester, Chief Technology Officer of Advanced Micro Devices, joined the company last September, taking over from Fred Weber. As CTO, he is responsible for setting the architectural and product strategies and plans for AMD's microprocessor business. Hester also chairs the AMD Technology Council, ensuring that product development, integration, and process organizations align technology capabilities with product direction.
Hester brings 30 years of system design and enterprise computing experience to AMD. Before joining the company, Hester was co-founder and SEO at Newisys, a start-up that produced enterprise-grade Opteron rack servers for OEMs. He spent 23 years at IBM serving in a variety of key leadership and executive technical roles. While at IBM, Hester lead a number of system technology development efforts, including the RS/6000, and served as one of 15 members of the IBM Corporate Technology Council.
Building on Success
Hester joins AMD at time when they appear to be at the top of their game. Their x86 product line is now well-regarded throughout the industry. It's no secret that in the past few years, AMD's 64-bit Opteron processors have grabbed a lot of market share in the HPC cluster market. The success of the Opteron processor, whose price/performance is currently regarded by most analysts as among the best in the HPC industry, has propelled AMD's rise to power.
In particular, Hester attributes much of AMD's success in penetrating the HPC cluster market as a result of the technical superiority of the AMD64 architecture. Reducing latency and improving bandwidth for both I/O and memory by using integrated on-chip memory controllers, the Direct Connect Architecture and HyperTransport technologies proved to be essential in improving system performance and enabling scalability. It's hard to overemphasize the importance of addressing memory and I/O bottlenecks for data intensive computations where fast processors and big caches can only take you so far.
“Obviously price/performance is one of the top items for both integer and floating point codes,” said Hester. “The other is, of course, scalability, both in terms of memory and I/O bandwidth. Having an integrated memory controller and separate I/O links let us continue to build balanced systems as you scale up. Integrated memory controllers allow you to have higher sustained bandwidth and lower latency.”
Hester also noted that the ability to execute legacy x86 32-bit applications along with x86 64-bit applications gives end users a comfortable migration path to 64-bit computing. The support of the standard x86 environment allows both Linux and Microsoft applications to run very easily. According to Hester, AMD's power efficient architecture is also highly valued.
As high performance computing is being mainstreamed into the commercial market, AMD sees itself in a good position. The use of Opteron processors in Sun, IBM and HP servers as well as in low-cost Taiwanese motherboards, have enabled AMD to enter the commercial data center from many avenues.
“In '03 and '04 I think we did pretty well in the [traditional] HPC market,” noted Hester. “This last year we gained credibility in the commercial data center.”
And as high performance computing penetrates the commercial business sector, AMD expects to leverage any success Microsoft has with its new HPC software ecosystem.
“I think a lot of the traditional HPC market is fairly entrenched around Linux,” said Hester. “More than anything else, their criteria is about how effectively your hardware runs in a Linux environment. There's a different story in the data center. In particular, if you look at some these large business segments like Wall Street and big retailers that are doing large-scale data analysis and data mining, I think you'll see a fairly major impact on what Microsoft can do.”
AMD is not content just to be satisfied with its past achievements. According to Hester, AMD's 2006 plans for the AMD64 architecture focus on incremental improvements along with some additional functionality.
“You'll see the introduction of chips that support DDR2 memory,” said Hester. “The performance and capability that this allows will be important. We're also adding [hardware] virtualization support that will be important to people who are building consolidated development or verification environments. There will also be some enhancements to the error checking capabilities.
“The technology steps this year are modest,” admitted Hester. “What we try to do is time the availability of new technology with the adoption rate of the industry. For example, we had the ability to introduce DDR2 products earlier. But when you look at where the memory vendors and infrastructure are, our view is that the middle of '06 is the right time to introduce that technology. We try to balance the introduction of new technology with the ability of the industry to support those technologies.”
What may be of greater interest for 2006 is AMD's recent decision to license its coherent HyperTransport interconnect technology to selected OEMs. This is not the industry-standard (non-coherent) HyperTransport technology that is used to connect I/O chips. The coherent HyperTransport is AMD's proprietary interconnect that enables processor-to-processor communication, while maintaining cache coherency. The licensing of coherent HyperTransport will enable OEMs to add heterogeneous processors to a system without having to develop their own specialized hardware/software architectures.
“We've talked to a number of people in the HPC market about the need for specialized execution engines that run certain workloads where a specialized coprocessor is the right answer,” said Hester. “A vector floating-point coprocessor is one example [think Cray]; XML and Java coprocessor accelerator chips are other examples. Being able to support them over the coherent HyperTransport link will be a big deal for people building these specialized types of HPC machines.”
The Next Generation
Beyond 2006, AMD is looking to revamp its 64-bit core technology to further increase interprocessor communication bandwidth, increase component modularity and address power efficiency more aggressively. In addition, the implementation of the 65 nm manufacturing process — currently scheduled for the middle of 2006 — will enable greater processor efficiencies and should pave the way for quad-core chips in 2007 and beyond.
“In general you'll continue to see the introduction of new memory and I/O technology speed bumps,” said Hester. “We are also working at ways of making the coprocessor interface even more general-purpose and more efficient. Somewhat related to that, we're considering extensions to the base instruction set that gives you, for example, 80 to 90 percent of the performance benefit of dedicated hardware without the expense of the [extra transistors].
“Future designs will be more modular, in other words, more flexible in terms of how to configure the number of cores and the memory hierarchy around those cores on-chip. So you can start to think about configuring cores and cache memory differently for environments with a small number of threads versus a large number of threads.”
According to Hester, power management will be a significant design issue for all future AMD processors.
“There will be much more focus on power management, across the board,” said Hester. “Historically most people thought about power management in the notebook space. Now we think about power management in the desktop and server space the same as we do in the mobile space. A good number to keep in mind is that in a data center, for every watt that the system dissipates, you probably have about three watts of power feeding the data center; those other two watts are generally associated with cooling and power distribution.”
AMD also intends to improve memory and I/O bandwidth and latency as they add more cores to their chips.
“As we look at these future processors that have more cores on them, you also have to be able to support higher memory bandwidth and more I/O,” said Hester. “So think about wider memory interfaces and more external I/O links on the processors. As you do that, you also have more of these coherent interconnect links that will allow you to build larger cache-coherent SMPs that have fewer hops between the processors, which means the performance would scale better as you build these larger SMPs.”
AMD is on track to develop both higher levels of multi-core processor and well as supporting those processors in larger SMP systems. AMD intends to support both the scale-out and scale-up models. Hester believes both have their place.
“When we talk to customers, some want to build large cluster built from low-cost, very good price/performance 1U building blocks,” said Hester. “Others want to build large systems out of a 4- to 8-way SMP building block. You can find workloads that run effectively in one environment and not the other and vice-versa.”
Asked about AMD's aspiration to develop anything beyond the x86 architecture, Hester emphatically denied there are any plans to abandon the x86. Apparently the company is not even planning to jettison 32-bit compatibility on any future designs. According to Hester, there's still too much 32-bit code to justify doing that and not enough hardware savings to make it worthwhile.
“We're pretty religious about the x86,” confessed Hester. “Software compatibility matters. There's nothing fundamentally wrong with the x86 architecture. If you talk to some pure computer scientists, you'll get a whole litany of things that are unelegant about it. But it's good enough, and with the extensions that were made in the revision for the 64-bit architecture, the major issues are fixed. I believe it's been estimated there are close to 100 billion lines of x86 code out there. I think the Itanium fiasco has kind of proved the point that it's [very difficult] to introduce a radically different architecture successfully — particularly one that doesn't provide any real system level benefits. So if we look at the limitation on the x86 architecture, it's not a problem. What it would cost you, in terms of breakage of software and everything else, isn't worth the small benefit you might get by introducing a brand new architecture.
“Now with that said, one thing we will look at, particularly as we evaluate the workloads for XML and Java applications, is the possibility of adding instructions to the base ISA to make those environments run more efficiently. The analogy I would use is the evolution of the graphics instructions through 3DNow. But any instructions added would be an extension, retaining full binary compatibility with the x86 base ISA.”
It's clear that AMD's ambitions are not only to compete with Intel's x86 offerings, but with other architectures as well. As the AMD processor technology evolves, Hester sees the x86 architecture gradually replacing RISC-based platforms in general-purpose HPC systems.
“We expect that x86 technology will replace the proprietary RISC processors in the high-end systems of the future, and as that transition happens, we expect to pick up our fair share of that market.” said Hester.
When asked if he thought AMD technology would someday be used in petascale systems, such as the ones being designed for DARPA's High Productivity Computing Systems (HPCS) project, here's what Phil Hester had to say:
“Absolutely! I came out of IBM and was involved in the SP2 [Scalable POWERparallel 2] work; a number of other people that are now here came from DEC and have built large systems. So we certainly understand what's required from both a microprocessor building block level and from a system design level to build that class of system. We want to be sure that we put those enabling functions in the next generation of processors so that we can play in that marketplace.”