Sending out an SOS: HPCC Rescue Coming

By By Christopher Lazou

May 26, 2006

“I don't know where we are going, but we'll get there quicker if we get started.” —  David Bernholdt-ORNL (SOS9, March 2005).

SOS is the recognized international distress call for help was it's quite apt for “capability” computing in the 1990s, especially in the U.S. Come the new century, and thanks to some help from new R&D funds for high productivity systems, IBM, Sun Microsystems Inc. and Cray Inc. are working hard to offer a rescue pathway.

The SOS Forum series was founded in 1997 under the initiative of people interested in High Performance Cluster Computing (HPCC) at the Sandia National Lab and Oak Ridge National Lab as well as EPFL in Switzerland. (EPLF is the Swiss cradle for the successful design and implementation of Beowulf systems). SOS stands for “Sandia, Oak Ridge, Switzerland.” In 1997, was the major centers were starting to explore the capacity of communication systems for building their own HPC clusters. (Note: At this time, Quadrics and Myrinet did not have commercial products.)

The SOS Forums take place annually in the spring and are open to anyone interested in discussing new and visionary ideas on HPCC, but the number of participants is deliberately kept low (not more than 50). The ninth SOS workshop took place in Davos, Switzerland, last March. For further details visit the SOS website: htpp://www.eif.ch/sos/

The thrust of the SOS Forum is to foster multi-laboratory, multi-national collaboration to explore the use of new parallel supercomputer architectures, such as clusters with commodity-based components, heterogeneous and web supercomputing etc., and is not focused on any particular system.

The theme of the ninth SOS Forum was Science and Supercomputers. The perceived wisdom is that “Today science is enabled by supercomputing, but tomorrow science breakthroughs will be driven by supercomputers.” The workshop explored what is needed to prepare for an age when manipulating huge data sets and simulating complex physical phenomena is used routinely to predict and explain new scientific phenomena.

The questions addressed at SOS9 were:

  • What are the computational characteristics needed to facilitate this transition?
  • How can the existing and emerging supercomputer architectures be directed to help science?
  • Is there a need for new facility models that cater to large science or is the traditional supercomputer center with thousands of users sufficient for the future?
  • What software and programming models are being explored to make it easier for scientists to utilize the full potential of supercomputers?

The SOS9 Forum was a tour de force of personalities from the U.S. and Europe discussing world-class activities at their sites and furnishing some insights on how future HPC products can effectively serve their scientific community and the needs of science at national level. These sites have heterogeneous environments and using systems from several major vendors.

Sites such as CSCS, Switzerland, HPCx facility, UK, ORNL in Oak Ridge are on a development path, which will define capability scientific computing for at least the next decade. The trend is for setting up partnerships between centers and computer vendors as well as collaborations with centers of excellence across national boundaries. A good example is the partnership between Sandia and Cray developing the $90 million Red Storm system. Bill Camp's motto is “Use high- volume commodity components almost everywhere, but when necessary for scalability, performance and reliability use custom development.”

The engineering task was how to deliver 40 Tflops per second peak performance using 10,000 AMD Opteron chips and a specially designed high bandwidth, low latency interconnect. Red Storm is already a great success, has since been made into a Cray product and is marketed as the Cray XT3.

According to Bill Camp, “Red Storm is achieving its promise of being a highly-balanced and scalable HPC platform with a favorable cost of ownership. It is setting new high water marks in running key national security and science applications at Sandia and elsewhere.”

In March, CSCS, the national Swiss leadership computer center, bought a large Cray XT3 system, as the first phase of its procurement cycle. CSCS has laid plans to team with leading U.S.-based supercomputing sites, the Pittsburgh Supercomputing Center, Oak Ridge National Laboratory and Sandia National Laboratories, to fine-tune the software environment and make the Cray XT3 technology mature for a broad spectrum of scientific production work.

According to Dr. Marie-Christine Sawley, CSCS CEO, “The Cray XT3 was bought as a highly scalable capability system for very demanding, high-end computational scientific and engineering research applications. The system is designed to support a broad range of applications and positions CSCS, as a leadership-class computing resource supplier for the research community of Switzerland. It also positions it for attracting highly visible, value added international collaborations.”

Sawley explained in her SOS presentation that their systems prior to the Cray XT3 include an IBM SP4 system where over 60 percent was used for chemistry codes and an NEC SX-5 high memory bandwidth vector system, of which 44 percent is used for meteorology/climate applications. The CSCS still offers services on both the SX- 5 and the SP4 systems; the XT3 represents an extension of its computing capacities toward true MPP. The phase two procurement this autumn is looking at providing suitable upgraded computing resources for the SX-5 user community requiring high memory bandwidth capability computing.

CSCS is working at establishing collaboration, including a visitor program with centers having Cray XT3 systems, on porting applications, system tuning and tools. CSCS is offering applications in chemistry, molecular dynamics, environment, material science and physics, from its core competences and customer portfolio. Tools, such as performance monitoring, debuggers and visualization, are also part of the focus or interest.

The keynote by Dr. Thomas Zacharia, Computing and Computational Sciences (ORNL, Associate Lab Director), titled: “A new way to do science: Leadership Class Computing at ORNL facilities,” typifies what these centers are likely to be developing into. ORNL was awarded funding by the DoE to address the opportunities and challenges of Leadership computing. This involves in part developing and evaluating emerging — but unproven — experimental computer systems. Their brief is to focus on Grand Challenge scientific applications and computing infrastructure — driven by applications. The goal of Leadership systems is to deliver computational capability that it is at least 100 times greater than what is currently available. It is acknowledged by funding bodies that Leadership systems are expensive, typically costing about $100 million a year.

It is now recognized that a focused effort is critical in order to harness the experimental potential of computing and translate it into breakthroughs in science. The infrastructure needed consists of capability platforms with ultra- scale hardware as well as software and libraries to efficiently exploit them, teams of hardware and software engineers and, most importantly, funding for seamless access, by research teams of scientists investigating Grand Challenge problems.

With DoE funding, ORNL recently set up the National Leadership Computing Facility. In the computing platform area, NLCF is concentrating on developing and proving several Cray purpose-built architectures, optimized for specific classes of applications.

NLCF has recently installed 1024 processors, aggregate of 18.5Tflops per second peak performance Cray X1E — the largest Cray vector system in the world. The Cray X1E has proven vector architecture for high performance and reliability, very powerful processors and very fast interconnection subsystem. It is scalable, has globally addressable memory with high bandwidth and offers capability computing for key applications. This system has been allocated to five high-priority Office of Science applications as follows:

  • 3D studies of stationary accretion shock instabilities in core collapse supernovae (415,000 processor hours).
  • Turbulent premix combustion in thin reaction zones (360,000 processor hours).
  • Full configuration interaction benchmarks for open shell systems (220,000 processor hours).
  • Computational design of the low-loss accelerating cavity for the ILC (200,000 processor hours).
  • Advance simulations of plasma micro-turbulence (50,000 processor hours).

Another platform just installed is the 5,212 AMD Opteron processors Cray XT3 system with aggregate peak performance of 25.1 Tflops per second. It has extremely low latency, high bandwidth interconnect, efficient scalar processors and balanced interconnect between processors providing capability computing. Although the Cray XT3 is new, its architecture is proven as is based on ASCI Red. It uses the Linux operating system on service processors and a specially adapted micro- kernel, for optimal performance on compute processors. According to Zacharia, benchmarks show this system is No. 1 in the world on four of the HPC Challenge tests and No. 3 in the world on the fifth.

To give a feel of the power of this system, in August 2005, just weeks after the delivery of the final cabinets of the Cray XT3, researchers at the National Center for Computational Sciences ran the largest ever simulation of plasma behavior in a Tokamak, the core of the multinational fusion reactor ITER.

The code, AORSA used for ITER, solves Maxwell's equations — describing behavior of electric and magnetic fields and interaction with matter — for hot plasma in Tokamak geometry (i.e., the velocity distribution function for ions heated by radio frequency waves in Tokamak plasma). The largest run by ORNL researcher Fred Jaeger utilized 3,072 processors — roughly 60 percent of the entire Cray XT3. The Cray XT3 run improved total wall time by more than a factor of three over its IBM P3 system.

The importance of this improved performance cannot be overstated. For decades, researchers have sought to reproduce the power of the sun, which is generated by fusion of small atoms under extremely high temperatures — millions of degrees Celsius. The U.S., Europe and other nations have joined forces to develop the multi-billion dollar International Thermonuclear Experimental Reactor. ITER's donut-shaped reactor uses magnetic fields to contain a rolling maelstrom of plasma, or gaseous particles, which comprise the “fuel” for the fusion reaction.

Cost-effective and efficient development and operation of ITER depend on the ability to understand and control the behavior of this plasma: its physics and optimal conditions that foster fusion. Harnessing fusion for future “clean” energy will have worldwide environmental ramifications.

NLCF expects to deploy a 100 Tflops per second Cray XT3 in 2006, followed by a 250 Tflops future Cray Rainier system in 2007 or 2008. Rainier is a unified product incorporating vector, scalar and potentially re-configurable and multi-threaded processors in a tightly connected system. This heterogeneous architecture offers a single system solution for diverse applications workloads.

The NLCF is built as a world-class facility. It consists of 40,000 square foot computer room and an 8 Mwatts power supply. It contains additional classrooms and training area for users, a high ceiling area for visualization (cave, power-wall, access Grid etc.) and separate laboratory areas for computer science and network research.

Using high bandwidth connectivity via major science networks, NSF TeraGrid, Ultranet and “Futurenet,” NLCF aims to integrate core capabilities and deliver computing for “frontiers” science. The program includes joint work with computer vendors to develop and evaluate next-generation computer architecture (e.g. Cray systems and IBM Blue Gene/L), create math and computer science methods to enable use of resources, (e.g., SciDAC, ISIC), nurture scientific applications partnerships and fund modelling and simulation expertise. The ultimate goal is to transform scientific discovery in the fields of biology, climate, fusion, materials, industry and other governmental agencies through advanced computing.

Instruments for international collaboration are also important for shortening time to solution and enhancing the potential for scientific breakthroughs. The ORNL program for Leadership computing includes collaborations with other large-scale computing centres, e.g. Sandia, PSC and CSCS.

As Zacharia said: “ORNL has a long standing partnership with Sandia and CSCS on many fronts; collaborations in applications areas, collaborations in enabling technologies, sharing of best practices in managing and operating our respective centers and, of course, our historical partnership in the SOS series of Forums.”

The NLCF is primed for active dialogue with academia, industry, laboratories and other HPC centers. The joint institute for computational sciences is to be a state-of-the-art distance-learning facility. It aims to provide incubator suites, joint facility offices, conference facilities and strong student and post-doctoral programs. It supports educational outreach through research alliances in math and science programs and industrial outreach through a computational center for industrial innovation. It also supports international collaborations in computational sciences by hosting guest scientists and visiting scholars.

Another speaker, Dr. Paul Durham, CCLRC Daresbury Laboratory, described capability computing on HPCx, an IBM Power4 based system, used as national resource for UK research. After giving many examples of scientific results, he said the project to move user consortia onto capability computing — defined as needing more than 1,000 processors — as follows: “Research done on HPCx is driven by specific scientific goals, set out in the peer reviewed grant applications. Some users are obtaining excellent results running on 128 to 256 processors. There may be no scientific case for moving these into the capability regime. The intention for the HPCx facility was that resources should only be granted to consortia with true capability requirements.” 

Durham concluded by asking a series of questions. The computational research community identified many fascinating and important Petascale problems, but has it achieved enough capability usage at Terascale? What are the best capability metrics? Do they have to be hardware based? Can capability science be defined? How many projects can be sustained before the capability mission gets diluted? Are there enough “capability” users with Petascale ambitions coming through? Are they in new fields or the usual suspects? Can we expect new fields for ‘capability' computing to arise spontaneously, or should we lead them to it?

Michele Parrinello, a professor from computational science ETH Zurich, gave a keynote presentation titled “The challenges of scientific computing.” He described many interesting scientific results in chemistry and molecular dynamics. He asked the rhetorical question: Why do simulations? His reply was to interpret experimental results, replace costly, or impossible experiments, gain insights and possibly predict new properties (e.g. virtual microscopy).

Another question is whether one can use molecular dynamics to explore long time scale phenomena? The answer: no, presently. Direct simulation allows only very short runs of ~10ps for ab-Initio MD and ~10ns for classical MD. Many relevant phenomena need longer time scale: chemical reactions, diffusion, nucleation, phase transition, protein folding and so on. 

Another presentation, from Professor Andreas Adelmann of the Swiss Paul Scherrer Institut, described research and “HPC demands in computational accelerator physics.” He briefly presented particle accelerators, how they are modelled with working examples and elaborated on next-generation particle accelerators, the High Energy — LHC, the High Intensity – -Spatial Neutron Source, the high brilliance light source and their modelling needs. This was illustrated by several examples, such as Particle In Cell simulations, using low dimension Vlasov solver, for relativistic electrodynamics, including collisions and so on.

His conclusion was that HPC hardware needed to consist of a large number of tightly coupled CPUs with access to low latency high bandwidth memory, especially for the large 3D n-body problem (in space and time) and for the fine grid 4(6)D, Vlasov solver. Fast I/O is essential as post processing is a parallel data mining activity. The software requirements are for efficient numerical implementations of FFT, MG and AMR, load balancing fault tolerant systems and algorithms. The Paul Scherrer Institut and, in particular, the particle accelerator project, headed by Dr. Adelmann, was pivotal for the participation of PSI inside the Horizon project, culminating in the recent purchase of the Cray XT3 system by CSCS.

In a panel titled “How we as a community can try to get a richer and more uniform programming environment across the variety of high-end platforms?” participants were Thomas Sterling (CACR CALTECH), David Bernholdt (ORNL), Pierre Kuonen (EIF) and Rolf Riesen (Sandia). Bernholdt discussed a uniform environment for high user productivity and the rapid creation of correct and efficient application programs.

He explained the different requirements for applications and algorithms, namely, high-level specification and low-level control. There is a trade-off in delivering generality, abstraction and scalability. There are also proposals to develop “polyglot” programming as described in a talk by Gary Kumfert (LLNL) at a workshop on high productivity languages and programming models (May 2004).

The requirements for these endeavors to succeed are “Legacy codes must be supported; traditional and new programming languages, traditional and new programming models, must be able to interoperate. Some language and model constructs are incommensurate, but for most some useful specification for interoperability can be established. It was suggested that BABEL should be adopted as the language interoperability vehicle for HPC, as it provides a unified approach in which all languages are considered as peers. It can act as the bridge for C, C++, Java, F77, F90, F2003, Python, etc. It is essential that language interoperability is build into standards. For example, F2003 provides interoperability with C. When designing and implementing new languages it is advisable to assume they are to be used in a mixed environment”.

Interoperability of programming models, presently need a lot of work in developing an abstract specification and overcoming practical obstacles in implementing them.

According to Bernholdt, productivity on diverse architectures is achievable using abstraction, vertical integration across the software stack and helpful hardware. Interoperability is also achievable in programming languages by using BABEL and standards, but this is much harder for programming models. A uniform programming environment is undesirable, as users need choices, not uniformity.

Computing has experienced exponential growth in the last 30 years and this is expected to continue. Yet the HPC user community has long been promised Terascale computing by forecasters carried away with new technology, but as Durham pointed out, is just about conquering Terascale problems, so to scale up to Petascale is an enormous task. Now that the industry is building heterogeneous computers, attempting to match hardware to application needs (e.g., the cascade approach described in an article by the High-End Crusader HPCwire, 8-12- 05), the problems for Petascale computing look more tractable. Only time will tell, whether the user community will be able to utilize these systems by 2010.

One of the greatest challenges for achieving 2010's target is delivering infrastructure for sustained performance. Technical challenges include chip densities and heat dissipation, power consumption and footprint at the component level as well as the memory wall (bandwidth, latency and connectivity) for harnessing tens of thousands of CPUs to handle large-scale simulations. The National Leadership Computing Facilities being set up at ORNL, at PSC, at CSCS and in the UK, etc., are extending the large-scale scientific computing frontiers.

Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. August 2005

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

Nvidia Appoints Andy Grant as EMEA Director of Supercomputing, Higher Education, and AI

March 22, 2024

Nvidia recently appointed Andy Grant as Director, Supercomputing, Higher Education, and AI for Europe, the Middle East, and Africa (EMEA). With over 25 years of high-performance computing (HPC) experience, Grant brings a Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire