Visit additional Tabor Communication Publications
June 10, 2009
Scientists demonstrate tools for analyzing massive datasets
June 10 -- As computational scientists are confronted with increasingly massive datasets from supercomputing simulations and experiments, one of the biggest challenges is having the right tools to gain scientific insight from the data. A team of Department of Energy (DOE) researchers recently ran a series of experiments to determine whether VisIt, a leading scientific visualization application, is up to the challenge. Running on some of the world's most powerful supercomputers, VisIt achieved unprecedented levels of performance in these highly parallel environments, tackling data sets far larger than scientists are currently producing.
The team ran VisIt using 8,000 to 32,000 processing cores to tackle datasets ranging from 500 billion to 2 trillion zones, or grid points. The project was a collaboration among leading visualization researchers from Lawrence Berkeley National Laboratory (Berkeley Lab), Lawrence Livermore National Laboratory (LLNL) and Oak Ridge National Laboratory (ORNL).
Specifically, the team verified that VisIt could take advantage of the growing number of cores powering the world's most advanced supercomputers, using them to tackle unprecedentedly large problems. Scientists confronted with massive datasets rely on data analysis and visualization software such as VisIt to "get the science out of the data," as one researcher said. VisIt, a parallel visualization and analysis tool that won an R&D 100 award in 2005, was developed at LLNL for the National Nuclear Security Administration.
When DOE established the Visualization and Analytics Center for Enabling Technologies (VACET) in 2006, the center joined the VisIt development effort, making further extensions for use on the large, complex datasets emerging from the SciDAC program. VACET is part of DOE's Scientific Discovery through Advanced Computing (SciDAC) program and includes researchers from the University of California at Davis and the University of Utah, as well as Berkeley Lab, LLNL and ORNL.
The VACET team conducted the recent capability experiments in response to its mission to provide production-quality, parallel-capable visual data analysis software. These tests were a significant milestone for DOE's visualization efforts, providing an important new capability for the larger scientific research communities.
"The results show that visualization research and development efforts have produced technology that is today capable of ingesting and processing tomorrow's datasets," said Berkeley Lab's E. Wes Bethel, who is co-leader of VACET. "These results are the largest-ever problem sizes and the largest degree of concurrency ever attempted within the DOE visualization research community."
Other team members are Mark Howison and Prabhat from Berkeley Lab; Hank Childs, who began working on the project while at LLNL and has now joined Berkeley Lab; and Dave Pugmire and Sean Ahern from ORNL. All are members of VACET, as well.
The VACET team ran the experiments in April and May on several world-class supercomputers:
To run these tests, the VACET team started with data from an astrophysics simulation, and then increased it to create a sample scientific dataset at the desired dimensions. The team used this approach because the data sizes reflect tomorrow's problem sizes, and because the primary objective of these experiments is to better understand problems and limitations that might be encountered at extreme levels of concurrency and data size.
The test runs created three-dimensional grids ranging from 512 x 512 x 512 "zones" or sample points up to approximately 10,000 x 10,000 x 10,000 samples for 1 trillion zones and approximately 12,500 x 12,500 x 12,500 to achieve 2 trillion grid points.
"This level of grid resolution, while uncommon today, is anticipated to be commonplace in the near future," said Ahern. "A primary objective for our SciDAC Center is to be well prepared to tackle tomorrow's scientific data understanding challenges."
The experiments ran VisIt in parallel on 8,000 to 32,000 cores, depending on the size of the system. Data was loaded in parallel, with the application performing two common visualization tasks--isosurfacing and volume rendering--and producing an image. From these experiments, the team collected performance data that will help them both to identify potential bottlenecks and to optimize VisIt before the next major version is released for general production use at supercomputing centers later this year.
Another purpose of these runs was to prepare for establishing VisIt's credentials as a "Joule code," or a code that has demonstrated scalability at a large number of cores. DOE's Office of Advanced Scientific Computing Research (ASCR) is establishing a set of such codes to serve as a metric for tracking code performance and scalability as supercomputers are built with tens and hundreds of thousands of processor cores. VisIt is the first and only visual data analysis code that is part of the ASCR Joule metric.
VisIt is currently running on six of the world's top eight supercomputers, and the software has been downloaded by more than 100,000 users. For more information about VisIt, visit http://visit.llnl.gov/about.html. To learn more about VACET, go to http://www.vacet.org/.
* DOE's Scientific Discovery through Advanced Computing Program (SciDAC)
* Kathy Yelick, Francesca Verdier, and Howard Walter. National Energy Research Scientific Computing Center (NERSC), Berkeley Lab
* Paul Navratil, Kelly Gaither, and Karl Schulz, Texas Advanced Computing Center, University of Texas, Austin
* James Hack, Doug Kothe, Arthur Bland, Ricky Kendall, Oak Ridge Leadership Computing Facility, ORNL.
* David Fox, Debbie Santa Maria, Brian Carnes, Livermore Computing, LLNL.
Source: Lawrence Berkeley National Laboratory
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.