The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
October 06, 2006
This week, Linux-on-Itanium fans convened at the Gelato Itanium Conference and Expo (ICE) in Singapore to talk about platform issues and spotlight success stories. Cameron McNairy, Itanium Processor Architect and Principal Engineer, gave the opening keynote as well as presented a couple of other technical sessions on the microprocessor architecture. HPCwire asked McNairy about Itanium's role in high performance computing, the current maturity of Itanium-based systems, and what we can expect to see in the future.
HPCwire: What do you think the Itanium processor brings to the table that can't be found in other RISC (Power, Sparc) and CISC (x86) architectures?
McNairy: The Itanium processor brings three things to the table: choice, flexibility and performance. Itanium systems are supported by six different OSes, over 10,000 applications and eight major OEMs providing specialized systems for different market segments. Other RISC-based systems are proprietary and don't come close to offering the breadth of solutions that Itanium can. The end result is that Itanium OEMs can invest in hardware systems that deliver more options across a wider range of potential customers. For example, HP has offered 3 different RISC based systems -- PA-RISC, Alpha, and NonStop -- each with their own associated OS. While you could not get a NonStop kernel on PA-RISC or an OpenVMS on MIPs, HP's Itanium-based Integrity servers are available to serve NonStop, OpenVMS, Linux, Windows, and HPUX customers which is a huge advantage for HP. The dual-core Itanium 2 processor holds four world performance records including a score of 4230 SPEC_int_rate_base_2000, nearly triple the previous record.
HPCwire: What attributes of the Itanium make it particularly well-suited to HPC workloads?
McNairy: The direct support of large memory SMP systems makes Itanium an ideal architecture for the most demanding HPC workloads. With 50-bit physical addressability, Itanium is poised to address and manage petascale memory subsystems. As we enter the petascale era, much must be considered regarding data sets and programmability of HPC systems. Time to solution is of major consideration. Where the application space hasn't had time to accommodate explicit message passing algorithms, Itanium-based SMP systems offer a fast time to solution environment. Combined with its multi-level error containment architecture, Itanium-based SMP systems can avail petascale performance quickly and can have a sufficient MTBI to allow the calculations to actually complete. The HPC arena was one of the key design targets of both the Itanium architecture and the Itanium 2 implementations.
HPCwire: How do you think the system OEMs are doing in exploiting the potential of Itanium?
McNairy: Great, but there is more work to be done. We work closely with our customers and are often amazed at the interesting things they are doing with our processors. For example, the SGI system with large SMP, the NEC, HP and Unisys systems take full advantage of the reliability, scalability and availability features of the architecture. At the same time, we see some of them approaching the processor as a black box leading them to miss out on some of the key capabilities and configurations. Accordingly, we work closely with our customers to help them deliver the best Itanium platforms possible. For example, Intel provides tools and training, along with access to our architects to resolve questions and concerns, and to enable designs. Software is one key element to any system and poor software can make even the best system perform poorly. We see software as a critical link in the Itanium platform chain. In order to achieve critical software momentum and address key software issues, the Itanium Solutions Alliance was formed last year to broaden software development. The Alliance pools the efforts of multiple OEMs, OSVs, and ISVs for added momentum and critical mass.
HPCwire: How would you characterize the current maturity of software support for Itanium (compilers and other development tools, libraries, OS support, applications, etc.)? What are the biggest challenges here?
McNairy: I am really pleased with the maturity of software for Itanium, but again, there is more work to be done. There are over 10,000 key applications that support Itanium today. The compilers are maturing and continue to deliver performance and robustness for Itanium systems. Could software support be even better? Yes. Will it get better? Absolutely. This is a marathon, not a sprint. Software momentum is ramping quickly with multiple innovations coming soon. One of the biggest challenges is scaling operating systems to 32-, 64- and up to 128-processor systems. We want broader focus from the software industry because of the opportunity that exists. We continue to work with this industry to extend their focus.
HPCwire: Can you talk a little bit about future Itanium features and their significance?
McNairy: In the future, Intel is planning on providing new features for the high-end market segment such as new reliability and availability features, multi-core processors and even higher speed interconnects. We will continue to work very closely with hardware and software vendors to better understand their needs while investigating ways to incorporate that input and knowledge into future designs.
HPCwire: A bit of a philosophical question here. There's been increasing talk in the HPC community about heterogeneous systems in which future platforms will be populated by a variety of specialized processors -- GPUs, Cell processor, FPGAs, vector processors, FP co-processors, etc. -- where each processor has the ability to deal with certain types of code more efficiently than a general-purpose processor. Do you think this is a natural evolution of computer architectures? And if so, where does this leave general-purpose scalar/FP processors?
McNairy: The answer depends on the ability to simplify the transition such that the return on investment is huge. Hardware is certainly easier to change than software and software would need to change to support special program/non-general purpose compute enhancement devices effectively. Thus, I see the current niche continuing -- specialty processors abstracted by a dedicated library that the application writer does not know about -- until we change the way we think about computational problems and the way we put algorithms to software. I am personally very excited about special purpose computing elements (my Masters thesis was on FPGA FP acceleration using a systolic design). I think the implementation is simple, the challenge lies in successfully abstracting the hardware such that the cost is low enough to produce a return on investment. Will user defined instructions/capabilities make it onto the processor? Certainly, but only in limited use cases until we address the software problem.
-----
Cameron McNairy is a Principal Engineer and an Intel Architect for the Montecito program. Previous to Montecito, Cameron was a micro-architect for the Itanium 2 processor, contributing to its design and final validation. He plans to focus on performance, RAS (reliability, availability, serviceability), and system interface issues in the design of future IPF products. He came to the Itanium 2 team soon after its inception from performance work on the first Itanium processor. Cameron received a BSEE and an MSEE from Brigham Young University. He is a member of the Institute of Electrical and Electronics Engineers.
Appro Xtreme-X1 Supercomputer is Intel® Cluster Ready Certified
Appro adopts the Intel Cluster Ready program to help simplify deployment, usage and management of high performance computing clusters to achieve faster and more accurate time-to-results. Learn how.
UPenn adds third state to nanowire storage; and UIUC is named the first CUDA Center of Excellence. John West recaps those stories and more in our weekly wrap-up.
Read More...
Modern civilization is positively drenched in data, some of which needs to be dealt with in real time to be of any value. Businesses, especially in the financial industry, have long recognized this, and have been building custom systems to collect, analyze, and react to information as it is captured. IBM thinks the time is right to generalize these approaches into a new field of computing -- and a new business -- it calls stream computing.
Read More...
Not all supercomputing rides on InfiniBand or proprietary interconnects. For technical applications that decompose neatly into loosely-coupled threads, a big cluster with vanilla Gigabit Ethernet does just fine. The top Ethernet system on the TOP500 list -- at number 58 -- is the new ATLAS cluster at the Max Planck Institute for Gravitational Physics in Germany.
Read More...
Jul 03 | Byte and Switch | The San Diego Supercomputer Center, which provides much of the core storage for the TeraGrid, is overhauling its 28 petabyte storage system to support tremendous data growth. Read more...
Jul 03 | ExtremeTech | Intel exec Pat Gelsinger said he sees the Intel Architecture permeating virtually every segment of computing, as the company's microprocessors expand into more and more cores. Read more...
Jul 03 | Bangkok Post | The latest programmable GPUs are starting to steal application cycles from CPUs. Read more...
Jul 02 | UC San Diego News Center | With the help of resources at the San Diego Supercomputer Center, UCSD scientists have isolated more than two dozen promising compounds from which new “designer drugs” might be developed to combat the avian flu virus. Read more...
Jul 02 | Chip Design Magazine | Dual- and quad-core processors barely scratch the surface of the potential of multi-core systems. Read more...
Jul 03 | | The paper explores some of the performance benefits of Star-P on commodity scalable systems such as IBM's Linux clusters based on multi-core Intel Xeon processors. The results demonstrate substantial performance gains with almost no programmer effort-roughly a 24-fold speed improvement for solving linear matrix equations. An overview of parallel computing with Star-P, a description of the performance test cases and description of IBM cluster configurations used for testing are also addressed.
Apr 17 | | An N-body simulation numerically approximates the evolution of a system of bodies in which each body continuously interacts with every other body, and it arises in many other computational science problems as well.
Jun 05 | | As pressure increases on the upstream seismic processing community to deliver ever-higher levels of productivity and efficiency, a new generation of storage solutions will be required that allow the maximum utilisation of high-performance computing (HPC) Linux cluster resources, together with the minimum of management overhead.
Today, HPC organizations are requiring substantially more floating point performance to solve real-world problems. In this podcast, Ben Bennett, ClearSpeed General Manager, discusses how acceleration technology can improve the overall performance of standard x86-based systems...
Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.