NCSA
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud

Developing Scientific Computing Communities


Researchers present experiences from ENZO, CACTUS, and iPlant API development efforts

By Aaron Dubrow, Texas Advanced Computing Center

Decades of scholarships and billions of dollars have gone into the development of community software codes that are crucial not only to science, but to our everyday lives and future.

The General Circulation Model, used by the Intergovernmental Panel on Climate Change to model our future environment, and the Weather Research and Forecasting model, which helps predict extreme weather, are two key examples. Others, like CHARMM and NAMD, are used by researchers and pharmaceutical companies to find drug leads and to better understand disease.

Nearly every field of science has a community code (or several) that satisfies a large percentage of the discipline's scientific needs. Great minds--and thousands of hours of PhD and post-doc labor--have gone into the creation of these codes. However, as new technologies emerge that are capable of delivering millions of times the power as previous systems, it is often necessary to rethink and rewrite these community codes, which is no small feat.

What to do about community codes has been an open question in the scientific computing community for many years. The problem is described in the final report of the National Science Foundation's Task Force on Software for Science and Engineering published in March 2011.

"All software must evolve to keep up with changes in systems, usage, and to include new algorithms and techniques," the authors wrote. "The scientific community has an interest in ensuring that the software it needs will continue to be available, efficient, and employ state-of-the-art technology."

Several sessions addressed this issue at the TeraGrid '11 conference in Salt Lake City, pointing to successful examples of community, and community code, development. These talks represented technologies or methods that interact with high-performance computing hardware and software at very different levels of the architecture; nonetheless, they represent possible paths for other scientific computing communities to follow.

Open Source Astrophysics

Brian O'Shea, assistant professor of physics and astronomy at Michigan State University, began his talk with a question: How do you transform a closed scientific computing code into a community code that can address the needs and harness the skills of a wide variety of researchers?

His talk described the evolution of the astrophysics code Enzo from a black box system that only a few understood or could access to a free-for-all in which divergent strains of the code proliferated, to the current state of controlled chaos in which several dozen developers experiment with and provide input into the code development, spurring rapid advances.

The new development workflow is "transparent to the users and easy to use," O'Shea said. "The result is that we have a very enthusiastic and involved user community. And it's sustainable."  

Enzo is used by a relatively small number of scientists, yet they are among the most adept and proficient users of HPC resources. Approximately 60 Enzo users consumed 60 million computing hours on the TeraGrid in 2010, according to O'Shea, leading to many astrophysical discoveries, including a better understanding of cosmic reionization.

An API to Feed the World

World governments and private industry are investing trillions of dollars in the collection of data relating to plants in the hopes of continuing to feed the growing population on Earth. To date, however, these data collections have been scattered and difficult to connect.

To address this issue, the National Science Foundation funded a five-year, $50 million dollar effort called "iPlant" to develop new tools, networks, and cyberinfrastructure that can connect plant biologists and bring their data together to spur insights and innovations.

Software developer Rion Dooley from the Texas Advanced Computing Center described the creation of a common application programming interface (API) for iPlant that allows researchers with little programming experience to add common functionality to their plant biology projects.

APIs are a particular set of rules and specifications that software programs use to communicate with each other. They serve as an interface between software programs and facilitate their interaction, similar to the way user interfaces facilitate interaction between humans and computers.

Modeled after popular social and industry APIs like Yelp or PayPal, the tools are intuitive, easy to use, and scalable on the very large high-performance computing systems of the Extreme Science and Engineering Discovery Environment (XSEDE). Among the most important API capabilities in iPlant are tools that allow any user to translate and integrate data in different file formats, allowing for far greater collaboration.

"The API serves as a Rosetta stone for our users," Dooley said. "It gives them a way to collaborate with any other user without having to be fluent in every piece software used in the plant bio community. And that's really the goal: to keep scientists focused on science rather than semantics."

Modular Software for Community Growth

A third example of community code development was featured in a full-day tutorial at the conference centered on the Cactus computational framework, an open-source problem-solving environment for scientists and engineers. Its modular structure enables parallel computation across architectures and collaborative code development between groups.

Cactus originated in the academic research community, where it was developed and used over many years by a large international collaboration of physicists and computational scientists. Applications, developed on standard workstations or laptops, can seamlessly run on clusters or supercomputers.

The Cactus user community has created and maintained toolkits for several research fields. The Einstein Toolkit (described at length in the TeraGrid '11 keynote talk given by NSF's Ed Seidel) is a powerful example of Cactus' capabilities. The Toolkit consists of an open set of more than 100 Cactus "thorns," or application modules, useful for computational relativity, along with associated tools for simulation management and visualization. The code has undergone tremendous growth by virtue of the development model in the last several years.

"Our aim is to provide the core computational tools than can enable new science, broaden our community, facilitate interdisciplinary research and take advantage of emerging petascale computers and advanced cyberinfrastructure," said Gabrielle Allen, Associate Professor in Computer Science at Louisiana State University and lead of the Cactus Code project.
 
Whether through the controlled chaos of the Enzo evolution, the add-on extensibility of the iPlant API, or the parallel framework offered by Cactus, successful models of community code creation and evolution are critical to the continued growth of the scientific computing community.

-----

Source: TeraGrid

HPCwire on Twitter

Discussion

There are 0 discussion items posted.

Join the Discussion

Join the Discussion

Become a Registered User Today!


Registered Users Log in join the Discussion

May 22, 2012

May 21, 2012

May 18, 2012

May 17, 2012

May 16, 2012

May 15, 2012

May 14, 2012

May 11, 2012

May 10, 2012

May 09, 2012


Most Read Features

Most Read Around the Web

Most Read This Just In

Acer

Feature Articles

OpenACC Starts to Gather Developer Mindshare

PGI, Cray, and CAPS enterprise are moving quickly to get their new OpenACC-supported compilers into the hands of GPGPU developers. At NVIDIA's GPU Technology Conference this week, there was plenty of discussion around the new HPC accelerator framework, and all three OpenACC compiler makers, as well as NVIDIA, were talking up the technology.
Read more...

NVIDIA Launches Kepler Into HPC

NVIDIA has introduced its first Kepler-generation GPU product for high performance computing, and revealed some of the inner working of the new architecture. The announcement took place at the kickoff of the company's GPU Technology Conference taking place this week in San Jose, California.
Read more...

Intel Rolls Out New Server CPUs

Intel Corp. has launched three new families of Xeon processors, joining the Xeon E5-2600 series the chipmaker introduced in March. These latest chips span the entire market for the Xeon line, from four- and two-socket servers, down to entry-level workstations and microservers. A number of HPC server makers, including SGI, Dell, and Appro announced updated hardware based on the new silicon.
Read more...

Around the Web

NVIDIA’s Bill Dally Talks 3D Chips and More at GTC

May 16, 2012 | Chief scientist discusses memory stacks, interconnects, and US technology leadership.
Read more...

NVIDIA Unveils Virtualized GPU with Kepler-Based Board

May 15, 2012 | GPU maker conjures up visualization technology for virtual desktops.
Read more...

Zettaflops Will Happen Says HPC Analyst

May 14, 2012 | Pessimistic predictions about technology have a poor track record, according to 451's John Barr.
Read more...

Next-Gen Memory on the Horizon

May 10, 2012 | DRAM manufacturers gear up for DDR4.
Read more...

US Energy Secretary Talks Supercomputing

May 09, 2012 | Steven Chu discusses the role of supercomputing in energy research.
Read more...

Sponsored Whitepapers

Sponsored Multimedia

ISC Think Tank 2012

Newsletters

PGI


HPC Job Bank


Featured Events







HPC Wire Events