The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
April 13, 2007
Last week at the High Performance Computing and Communication Conference in Newport, Rhode Island, Doug Kothe gave an overview of leadership computing facility at Oak Ridge National Laboratory (ORNL) and talked about the lab's plans for its future computing systems. Kothe, the Director of Science in the National Center for Computational Sciences (NCCS) at ORNL, and a nuclear engineer by training, is no stranger to supercomputers. He has spent most of his career at Los Alamos National Laboratory developing and working with CFD and other multi-physics codes. Before coming to ORNL in January 2006, he was the Deputy Program Director of the LANL ASC Program.
As part of his presentation at the conference, Kothe gave the audience a sense of the preparations going on around the upcoming Cray supercomputer deployments. As one of the Department of Energy's leadership computing facilities, ORNL is in line to get some of the most powerful systems on the planet. By late 2007, ORNL will have upgraded the existing 119-teraflop Cray XT4 'Jaguar' system to a peak performance of 250 teraflops. By late 2008, a new one petaflops Cray 'Baker' system will be installed. Both machines will employ the upcoming quad-core AMD Opteron processors.
The current and planned systems at ORNL represent the largest open resources for computational science research in the world. The scientific research being conducted on these machines is through projects granted allocations via the highly competitive and popular INCITE Program (http://hpc.science.doe.gov/allocations/incite/).
While the computing hardware plans are already in place, the lab is busy lining up other infrastructure and getting the applications ready for the new systems. Although the Cray systems were specifically selected for the types of "big science" applications that the DOE runs, there is still a great deal of work to be done in getting the codes ready for the new systems. In addition, since the optimal types of I/O systems and archive storage are dependent on the application dataset requirements, the storage systems still need to be matched up with the workloads.
"Requirements flow both ways," said Kothe. "The applications impose requirements on the systems and the systems impose requirements on the applications." He said that until they get the thousands of quad-core AMD processors on-site, detailed upstream computer and computational science performance analysis and modeling is required to get a handle on how the applications are going to perform. The developers and NCCS staff are also using testbeds and simulators in this process.
With the next Jaguar upgrade less than six months away, the DOE Office of Advanced Scientific Computing Research (ASCR) has selected the applications that will be granted early user access on the new system. Part of the process involved surveying 20 to 30 different applications teams for the suitability of their codes. The teams were asked questions like: "If you had a 250-teraflop system all to yourself for a short while, what would you do? What are you modeling? What do the algorithms look like? Is your code ready or what would you need to get ready?" In general, leadership computing systems are for scientists who can't advance their science easily without such resources. The scientists have the burden of proving that they need the full system resources to do their research. This process is carried out in a peer-reviewed fashion through the INCITE Program.
The collected information from the surveys was sent to ASCR, the DOE Program Office (http://www.sc.doe.gov/ascr/) whose mission is to deliver leadership computing capabilities to scientists. According to Kothe, six codes have been selected that they believe can be ready when the 250-teraflop system is installed. The applications areas include combustion science, astrophysics, fusion energy, chemistry, material science/nanoscience, and climate. The code teams are gearing up in anticipation.
The same sorts of plans have been started for the 2008 Baker system; they're just not as far along. But they've already polled many scientists on what they would do with the petaflops machine.
The application scale-up work relies on the availability of testbeds and simulators. "The sooner we can get our hands on the [Opteron] quad-core test beds, the better," said Kothe. "We think this will be in place in early summer." Fortunately, ORNL already has Jaguar, a large dual-core Opteron system. So the transition should be pretty smooth and hopefully without too many last-minute surprises."
The real challenge for the applications will be to use as much of the new systems' computing power as possible. This is the classic problem for HPC applications. As the growth in the number of computing cores increases, it often outstrips the ability of applications to parallelize. The petaflops Baker system is expected to contain over 22,000 quad-core processors.
Page: 1 of 2(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html