The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
June 10, 2009
Scientists demonstrate tools for analyzing massive datasets
June 10 -- As computational scientists are confronted with increasingly massive datasets from supercomputing simulations and experiments, one of the biggest challenges is having the right tools to gain scientific insight from the data. A team of Department of Energy (DOE) researchers recently ran a series of experiments to determine whether VisIt, a leading scientific visualization application, is up to the challenge. Running on some of the world's most powerful supercomputers, VisIt achieved unprecedented levels of performance in these highly parallel environments, tackling data sets far larger than scientists are currently producing.
The team ran VisIt using 8,000 to 32,000 processing cores to tackle datasets ranging from 500 billion to 2 trillion zones, or grid points. The project was a collaboration among leading visualization researchers from Lawrence Berkeley National Laboratory (Berkeley Lab), Lawrence Livermore National Laboratory (LLNL) and Oak Ridge National Laboratory (ORNL).
Specifically, the team verified that VisIt could take advantage of the growing number of cores powering the world's most advanced supercomputers, using them to tackle unprecedentedly large problems. Scientists confronted with massive datasets rely on data analysis and visualization software such as VisIt to "get the science out of the data," as one researcher said. VisIt, a parallel visualization and analysis tool that won an R&D 100 award in 2005, was developed at LLNL for the National Nuclear Security Administration.
When DOE established the Visualization and Analytics Center for Enabling Technologies (VACET) in 2006, the center joined the VisIt development effort, making further extensions for use on the large, complex datasets emerging from the SciDAC program. VACET is part of DOE's Scientific Discovery through Advanced Computing (SciDAC) program and includes researchers from the University of California at Davis and the University of Utah, as well as Berkeley Lab, LLNL and ORNL.
The VACET team conducted the recent capability experiments in response to its mission to provide production-quality, parallel-capable visual data analysis software. These tests were a significant milestone for DOE's visualization efforts, providing an important new capability for the larger scientific research communities.
"The results show that visualization research and development efforts have produced technology that is today capable of ingesting and processing tomorrow's datasets," said Berkeley Lab's E. Wes Bethel, who is co-leader of VACET. "These results are the largest-ever problem sizes and the largest degree of concurrency ever attempted within the DOE visualization research community."
Other team members are Mark Howison and Prabhat from Berkeley Lab; Hank Childs, who began working on the project while at LLNL and has now joined Berkeley Lab; and Dave Pugmire and Sean Ahern from ORNL. All are members of VACET, as well.
The VACET team ran the experiments in April and May on several world-class supercomputers:
To run these tests, the VACET team started with data from an astrophysics simulation, and then increased it to create a sample scientific dataset at the desired dimensions. The team used this approach because the data sizes reflect tomorrow's problem sizes, and because the primary objective of these experiments is to better understand problems and limitations that might be encountered at extreme levels of concurrency and data size.
The test runs created three-dimensional grids ranging from 512 x 512 x 512 "zones" or sample points up to approximately 10,000 x 10,000 x 10,000 samples for 1 trillion zones and approximately 12,500 x 12,500 x 12,500 to achieve 2 trillion grid points.
"This level of grid resolution, while uncommon today, is anticipated to be commonplace in the near future," said Ahern. "A primary objective for our SciDAC Center is to be well prepared to tackle tomorrow's scientific data understanding challenges."
The experiments ran VisIt in parallel on 8,000 to 32,000 cores, depending on the size of the system. Data was loaded in parallel, with the application performing two common visualization tasks--isosurfacing and volume rendering--and producing an image. From these experiments, the team collected performance data that will help them both to identify potential bottlenecks and to optimize VisIt before the next major version is released for general production use at supercomputing centers later this year.
Another purpose of these runs was to prepare for establishing VisIt's credentials as a "Joule code," or a code that has demonstrated scalability at a large number of cores. DOE's Office of Advanced Scientific Computing Research (ASCR) is establishing a set of such codes to serve as a metric for tracking code performance and scalability as supercomputers are built with tens and hundreds of thousands of processor cores. VisIt is the first and only visual data analysis code that is part of the ASCR Joule metric.
VisIt is currently running on six of the world's top eight supercomputers, and the software has been downloaded by more than 100,000 users. For more information about VisIt, visit http://visit.llnl.gov/about.html. To learn more about VACET, go to http://www.vacet.org/.
Acknowledgments
* DOE's Scientific Discovery through Advanced Computing Program (SciDAC)
* Kathy Yelick, Francesca Verdier, and Howard Walter. National Energy Research Scientific Computing Center (NERSC), Berkeley Lab
* Paul Navratil, Kelly Gaither, and Karl Schulz, Texas Advanced Computing Center, University of Texas, Austin
* James Hack, Doug Kothe, Arthur Bland, Ricky Kendall, Oak Ridge Leadership Computing Facility, ORNL.
* David Fox, Debbie Santa Maria, Brian Carnes, Livermore Computing, LLNL.
-----
Source: Lawrence Berkeley National Laboratory
(Digg, Technorati, more)
Appro Ready-To-Go-Clusters – Quickly deploy ANSYS & Intel Cluster Ready Solutions
Offering a fully integrated Ready-To-Go Cluster based on the Appro GreenBlade System supporting up to 28 blade nodes in a half-size standard rack cabinet, including master nodes and switches.
TACC's Ranger supercomputer celebrates its second year of enabling important research; Microsoft partners with NSF to bring cloud services to researchers; and NSF submits its fiscal year 2011 budget request. We recap those stories and more in our weekly wrapup.
Read More...
It seems only natural that the US space agency would be casting its eyes toward the clouds. Sure enough, NASA is now looking to cloud computing to optimize the operation of the agency's IT infrastructure for some of its science codes. Like many commercial businesses and government organizations, NASA is being asked to do more computing with fewer datacenter resources.
Read More...
There is no such thing as an NSF (Supercomputer) Center and there never has been. There should be. What there are, in the words of Ed Hayes, then comptroller of NSF, are "NSF ASSISTED Supercomputer Centers." This is a double edged sword.
Read More...
Feb 09 | eWeek Europe | Company says new high-end servers will deliver "intelligent performance." Read more...
Feb 09 | EE Times | Wireless technology promises energy-efficient chip-to-chip communication. Read more...
Feb 08 | eWeek | A new kind of Rocky Mountain high. Read more...
Feb 08 | Computerworld | Chip maker hopes to bring CPU-GPU processors to servers in two years. Read more...
Feb 05 | Technology Review | IBM has created graphene transistors that leave silicon ones in the dust. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Jan 11 | | LLNL is home to some of the fastest computers in the world. In 2012, LLNL expects to have the Sequoia supercomputing cluster operational with a projected performance of over 20 PFLOP/s. These systems will focus on strengthening the foundations of predictive simulation through running large suites of complex simulations and then comparing model predictions with experimental data. To visualize this project’s large amount of data, LLNL requested an Appro Supercomputing Cluster specifically designed to support interactive data analysis.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html