At its eighteenth meeting last week in Richmond, Virginia, the HPC User Forum welcomed a record 133 participants from the U.S., Europe and Japan, and treated them to examples of leadership computing and a technical discussion of HPC storage systems and data management.
The HPC community is invited to attend upcoming HPC User Forum meetings: May 29-30 in Bologna, Italy (hosted by CINECA); June 1-2 at ETH in Zurich; and September 18-20 in Denver, Colorado. The European meetings follow a different format and are free of charge. The meetings are co-sponsored by HPCwire.
IDC's Earl Joseph, who serves as executive director of the HPC User Forum, thanked the users for attending and providing 28 excellent end-user presentations, and the vendors who sponsored the meals, including HP, Intel, IBM, Linux Network, The Portland Group and Microsoft. He said the HPC purchasing power of the users at the meeting exceeded $1 billion. For the past three years, the HPC market has been the fastest-growing IT market IDC tracks, achieving 94 percent aggregate growth since 2002 and increasing 24 percent in 2005 alone to reach $9.2 billion in revenues. HP and IBM were virtually tied for first place, with Dell in third position. Clusters have been a disruptive force (71 percent growth in 2005) and now represent nearly half the market, while the capability- and enterprise-class segments continue their modest decline. The workgroup segment, for systems priced at $50,000 and under, has grown 200 percent since 2002.
IDC is interested in hearing about users' impressions of multi-core processors and accelerators. Email Earl Joseph, [email protected].
Steering Committee Chairman Paul Muzio, VP-Government Programs for Network Computing Services, Inc. and Support Infrastructure Director of the Army High Performance Computing Research Center, welcomed the Richmond participants. He noted that as the HPC industry approaches petascale computing, many users feel storage has lagged behind HPC system capabilities. The storage topic will be continued at the September meeting. Regarding the AHPCRC, he said research areas include light combat vehicles, vehicle armor, bunker busters, penetrators and fuzes. Body armor is a serious concern: it's heavy, warm and needs be more flexible. The goal is to develop lighter, equally survivable armor.
Simon Szykman, director of the National Coordination Office for Networking and Information Technology Research and Development (NITRD), noted the Administration's strong support for supercomputing, evidenced by increases in President Bush's 2007 proposed budgets for NSF (12 percent), DOE-Science (35 percent) and NIST (10 percent), and calls for these budgets to be doubled over the next decade. Szykman applauded the HPC User Forum's focus on storage performance as well as cost, showing data that Moore's Law, Linpack, and hard drive capacity and cost have all been increasing at much higher rates than storage technology performance.
Doug Ball said Boeing uses CFD to design major portions of planes today, inside and out. Simulation saves time and money, and test data don't provide as much insight. HPC-based simulation has given Boeing competitive advantages in many areas, for example by reducing the spacing needed between planes at take-off. With reduced spacing, more takeoffs and landings are possible, plus flights are safer. Among remaining challenges: “fly” the Navier Stokes equations and not a database; computationally determine the acoustic signature of an airplane in one day; and true multidisciplinary design optimization.
Takeshi Yamaguchi said the main products of Aisin AW Co., Ltd. are automatic transmissions and GPS navigation systems. Aisin had 50 percent global market share for automatic transmissions in 2005 (5 million units) and developed the first six-speed automatic transmission for Lexus, VW and Porsche, as well as hybrid systems for Ford. Aisin had 40 percent global market share in 2005 for GPS systems. Aisin began using CFD analysis for torque converters in 1996 and most recently acquired a 48-processor Linux Networx system. Models with more than 10 million elements will be needed for detailed interior design of automatic transmissions.
Dolores Shaffer of Science and Technology Associates (SCA) reviewed the DARPA HPCS Program, noting that Phase II vendors will soon be submitting their proposals for Phase III. Some vendors are now trying to achieve petaflop computing in 2008. Users want petascale capability to perform more detailed simulations, with more data, and to do multi-physics multi-scale problems. The memory wall is growing, and system size is an increasing issue. She noted that more users are employing the HPC challenge benchmark suite in their procurements. Summer 2006 is the estimated start timeframe for Phase III, in which one or two vendors will be selected to finish their designs and build prototype systems.
Keith Gray, British Petroleum (BP), reported that exploring the Thunder Horse field in the Gulf of Mexico (potential reserves: 1 billion barrels) will require the same order of investment as a new Intel processor plant. HPC has reduced Thunder Horse migration from 3 weeks to 1 day and made BP the energy-industry leader in 64-bit computing. BP's strategy is to lease computers and remain agile enough to take advantage of platform breakthroughs.
Jeff Nichols, Oak Ridge National Laboratory, summarized two important global collaborations involving HPC: the Intergovernmental Panel on Climate Change (IPCC), and ITER, which is exploring plasma energy as a virtually unlimited source of clean electricity. ORNL is the lead U.S. organization for ITER and has done a substantial portion of the U.S. computational work for IPCC. ORNL plans to upgrade its HPC systems to 100 TF in 2006, 250 TF in 2007, and 1 PF perhaps as soon as 2008.
Liquid Computing's Mike Kemp gave a new HPC vendor update. The company plans its product launch in June 2006, with general availability in August 2006. The product initially will be sold with up to 12 chassis but is designed to scale to a petaflop. It can include co-processors (FPGAs and others) to complement best-in-class microprocessors. Up to 16 GB of interconnect per chassis. Latency ~2 usec between chassis across the network. Compute module: 4 Opterons with RAM.
Paul Muzio introduced the technical sessions on storage, noting that they would address shared file systems software, hierarchical storage systems software, and hardware media, and that users and vendors would be asked, “If this were 2008, what would you need for petascale HPC systems to be delivered in 2011?” He thanked Brad Blasing of AHPCRC-NSCI for organizing the storage sessions.
Henry Newman, Instrumental, said storage requirements for the 2010-11 timeframe were collected from NRO, NSA, DOE-SC, NNSA, HPCMP and NASA. This led to 12 different I/O scenarios. High speed I/O will be the Achilles heel of petascale computing. Why not use some of same technologies as earlier, such as vectors and multithreading, to address some of these I/O issues? If it's true for memory, it's true for I/O.
Brad Blasing said the AHPCRC has three different storage systems today (direct attach disk, home directory space, infrastructure space) and wants a single system by 2011, with 3 to 4 PBs, speed configurable to 100 GB/s, and able to retain up to 1 billion files for years. Since 1985, the price for 1 GB of storage has dropped from $100,000 to about $3. Extrapolating to 2011, this would mean 2 to 3 PBs would cost about $1 million.
Michael Knowles said the Army Research Laboratory was initially using a failover system with Veritas software and now has standalone active servers. On the disk side, ARL is looking at going to a fourth generation of EMC disks. For the 2011 file system, ARL would want to move HSM straight out to the center fabric and get away from the multiple copies. This implies a scalable, redundant switched network.
According to Hank Heeb, Boeing uses an archival file service (HSM) for long-term retention and is trying to centralize on a single type of file system. By 2011, Boeing expects to grow to at least 500 TBs and have a high-speed, object-based parallel file system with HSM, with the ability to expand dynamically at any time.
Sharan Kalwani, General Motors Corporation, said today it's hard to meld together all the technologies and get 99.97 percent or better uptime. He expects to continue to work with many vendors. For CAE, crash is the biggest application and the data volume is exploding. GM ran 100 simulations/month 3 years ago, and is running 3,000/month today. This may grow to 10,000/month in a few years. Five years ago, 12 to 16 GB/s I/O performance was enough; today it's 160 GB/s. Scalability, security, robustness, data access and data content all are bigger issues today than five years ago.
Bill Thigpen, NASA Ames Research Laboratory, said moving data to/from Columbia has been a challenge. The lab is testing a new process for unattended file transfer. The file system is evolving from direct connect to disks, to a shared file system. In 2011, Ames expects to store 125 TB/day and retrieve 25 TB/day, with 25 PB archive RAID cache, 200 Gb/s across the network, a 200 PB tape media archive, and 1 billion files and directories.
Carol Pasti said NOAA/NCEP needs to deliver over 200,000 products/day (weather forecasts and analysis) on time. They are a large IBM shop and have 3 PBs of tape storage. Future requirements include improved write performance on smaller sized file systems; easier disk management; improved metadata access (GPFS multi-clustering will replace NFS); and an archive that looks/acts/feels like a file system from all nodes in the cluster.
According to Paul Buerger, the Ohio Supercomputer Center's current environment includes a 400 TB SAN, plus Ext3/NFS, PVFS2 (scratch space), SAN FS (special purpose projects) and a database server. The number of files has tripled in the past three years (tens of millions today). “If we extrapolate to 2011, we're up around 1 petabyte of storage need and about 200 million files. Flexibility is the huge overriding future requirement, because our users don't ask us before doing what they do.”
Nick Nystrom noted that the Pittsburgh Supercomputer Center heads user support and security for the Teragrid. PSC has already begun to plan as far out at 2011, driven by requirements for efficient application I/O, checkpoint/restart and data movement across the Teragrid. By this timeframe, PSC expects to require 1 GB/s of bandwidth per TF of computation.
Bryan Banister, San Diego Supercomputer Center, said the center is part of the Teragrid and, as an academic center, run many types of applications. Earthquake simulations alone generate 47 TBs of data per week. The Storage Data Broker is a metadata resource. By 2011, he expects archival data to grow to more than 100 PBs.
Priya Vashista, University of Southern California, uses HPC to simulate the atomic and molecular behavior of materials, modeling millions of atoms at a time. For nanophase highly energetic materials (a.k.a. explosives), the goal is for the explosive to be stable yet provide maximum energy when ignited. For hypervelocity impact damage, HPC is used to study the cascade of processes that occur following impact. USC's Chevron-supported Center for Interactive Smart Oil Field Technologies uses HPC for reservoir management and model validation.
Major Kevin Benedict explained that the Maui High Performance Computing Center is part the Air Force Research Lab energy directorate and a distributed center for DOD HPCMP. Customers also include the Navy, Army, Marine Corps and others. Focus areas: image and signal processing (enhance electro-optical images from telescopes; detect very small satellite through telemetry and other methods; database fusion and management technologies); battlefield modeling and simulation; system and software integration.
DOE's Pacific Northwest National Laboratory, Kevin Regimbal reported, uses HPC, including its 11.8 TF HP Itanium system, for the NNSA-funded Energy-Smart Data Center (ESDC) program, which is trying to save energy (and improve performance) by cooling supercomputers more efficiently. PNNL is working with SprayCool (adaptive spray cooling). This could make computers much smaller, denser. PNNL converted HP RX2600 nodes to SprayCool and did this to a full rack of HP servers.
Brad Blasing moderated part 2 of the storage sessions, this time asking storage vendors that same questions users weighed in on earlier.
Sun/StorageTeK's Harriet Covertson reviewed Sun's StorEdge Shared QFS, which is fully integrated with Sun's HSM, SAM-FS. Sun is moving to object-based storage, which decouples the physical storage technology from applications and file systems. She said this is a paradigm shift.
Yung Yip reported that in 2011, Imation will be shipping a 4 TB cartridge and will move to 12 TBs in 2016. Transfer rate advances will not be as rosy, probably 300 MB/s in 5 years and 800 MB/s in 10. There are no unsurmountable challenges in implementing this roadmap. Immediate challenges include mechanical scaling and media noise and head-to-tape spacing. Imation is also working on tape/HDD hybrid cartridges.
IBM's Bob Curran said that today, cluster file systems are used well beyond big science. In 2011, disk drive heads will move to the TB range, but with no massive increase in rotational speeds (10 GB/s). General-purpose networks will operate at 30 GB/s or more, over greater distances. This will enable larger and faster file systems. New applications will probably drive where we need to go.
Dave Ellis said that Engenio is part of LSI Logic today. Data is growing, making huge demands on the data center. IT costs are constrained. I/O interfaces and storage protocols are the foundation for the future. Multicore processors are also an important factor.
David Fellinger, Data Direct Networks, said drives have become worse, because areal density has been increasing. The true performance of drives is limited not just by seek time but by error recovery. DDN solves this problem with state machine logic, so data never is handled by a CPU. DDN can do parallel processing of data and error recovery on the fly.
COPAN Systems' Aloke Guha said bandwidth is improving but not latency, so the only way to go is tiering (tiered options such as SSD-Disk-MAID-tape). HSM and tiered storage will evolve to become more content-aware. For large-scale backup, restore is the challenge. For long-term preservation, format is an issue. It's hard to read older tapes in non-current formats.
Jeffrey Denworth, Cluster File Systems, argued that Lustre can deploy storage in almost any way you want. Two key technologies for petascale systems are clustered metadata and client write-back metadata cache. These technologies also enable more enterprise-style functionality. He expects no completely analytical file system within the next 5 years.
Rich Altmaier said SGI's approach has been to provide various storage solutions. In the future, storage architectures should be cleanly layered; you need to assume indexing and tracking databases will contain bugs and data corruption; managing failures of the hardware elements and connectivity will be vital; and notions of virtualization are very valid.
HP's Scott Misage sees three dimensions: (1) users are driving requirements for file sizes and latencies; (2) there is geographical dispersion via grids, multinationals, utilities; and (3) system complexity is growing enormously. Exa-scale challenges: How do you manage 100K nodes storage system? How do you share 100K nodes? HP Labs is doing early work on self-managing storage systems.
IDC's Addison Snell reviewed dates and locations for HPC User Forum meetings for the rest of 2006 and early 2007.
Earl Joseph moderated a session on HPC collaborations. The initial speaker, Suzy Tichenor, Council on Competitiveness, said HPC is a key ingredient in America's innovation capacity and reviewed key findings from recent Council studies that showed HPC is essential to business survival. Lack of scalable, production-quality HPC application software is holding industry back from using HPC more aggressively for competitive advantage. There is an opportunity to accelerate application software development through partnerships.
Jeff Candy discussed General Atomics' work on plasma reactions in a tokamak (General Atomics is a participant in DOE's INCITE program). The ultimate goal is to obtain clean energy from fusion, and the largest project related to this is ITER. The company's GYRO code calculates the turbulent flow of heat and particles in reactor plasmas.
John West, Engineer Research and Development Center (ERDC), said continuing to use the command line as the primary interface to most supercomputers “will cost us competitively in the U.S.” What's needed to make HPC usage more pervasive, especially among new graduates, is a friendlier, more flexible GUI for which Google might serve as a model.
Kent Misegades said CEI's EnSight visualization tools and Harpoon meshers prove that an ISV dedicated exclusively to the HPC market can be very successful. What ISVs need are early adopters to share risk; access to HPC systems and support from HW vendors; funded development in the absence of adequate market demand; IP rights; no open source requirement; and government acting to help, not compete, with ISVs.
Dolores Shaffer described opportunities for involvement and partnerships within the DARPA HPCS program. She invited people to make proposals to DARPA. If you have a really strong idea, join DARPA to make it happen.
Sukumar Chakravarthy, Metacomp Technologies, reviewed the American Rocket Car Project, whose goal is to recapture the land-speed vehicle record for the U.S. The American vehicle is 47 feet long and has 120,000 horsepower (powered by a hybrid rocket engine). Metacomp is using its own meshing technology and generated a 8.2 million-prism tetrahedral mesh. Their job is to analyze the design.
In the final session, Chevron, BP and NOAA discussed dealing with disasters, especially experiences related to Hurricane Katrina.
Mike Netzband, Chevron, stressed the importance of detailed advance preparation. You need to recognize that information is a corporate asset and treat it accordingly; understand what information really is critical; create actionable plans to replicate, mirror, back up data; test the plans; and, most important, act at an appropriate time, rather than wait.
Keith Gray, British Petroleum, said the question becomes, what can you do on a reasonable budget? Based on BP's Katrina experience, the company will plan for staff needs (hotel reservations, evacuation routes, how to stay in touch); clarify expectations and plans; understand options for short term alternative sites; replicate source code; and work to improve near-line capabilities.
Gary Wohl said NOAA/NCEP thinks in terms of continuity of operations, rather than disaster recovery. NCEP's mission is all weather, all the time. NCEP needs to ensure its whole customer base, which includes first responders, that they can get the information they need to complete their missions. The center runs drills and has established two continuity of operations (COOP) sites. Future plans include increasing the physical separation between primary and backup sites.
Earl Joseph thanked everyone for attending and invited everyone to join in future HPC user Forum meeting that are posted at: www.hpcuserforum.com.