The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
March 21, 2008
University of Utah researchers use parallel genetic algorithms to predict crystal structures for a variety of organic substances
NCSA has helped Julio Facelli do his work for years, and that gives him an uncommon perspective on the impact that the center and others like it have had.
"I've been following [the National Science Foundation-supported centers] since the beginning," says Facelli, "and they've done a tremendous job of encouraging new approaches to science. In the beginning, users were the traditional suspects. But now more and more people realize that they need computational simulation to understand their experiments. In every field, the centers have had an impact on the way people do science."
"Simulation science is becoming part of the mainstream toolkit of the experimental scientist. The NSF centers drove that."
Facelli, director of the University of Utah's Center for High Performance Computing and a biomedical informatics professor, uses NCSA's Mercury cluster to predict crystal structures for organic molecules that are frequently used in pharmaceuticals, fertilizers, and explosives.
"Fifteen years ago, the drug companies wouldn't think about talking to a guy like me," he says. "But computational molecular science -- numerical analysis -- can now show them the promise of a good solution for their formulation problems."
Supercomputing is part of the reason for this shift, but so is the method that he and his team have developed for calculating crystal structures. This method was featured in a 2007 Journal of Chemical Theory and Computation publication.
Survival of the fittest
Crystal structure refers to the identical pattern of atoms that repeats over and over to form the macroscopic material. The details of this microscopic duplication have a big impact on the substance's macroscopic properties, influencing features like solubility, reactivity, and color.
Modeling the crystal structure of a given substance, Facelli and his team begin with nothing more than the atoms in the molecule and the nature of their bonds. They're looking for structures with the lowest energies, which typically mark the molecules' standard crystal structures or something very close. The problem is that this straightforward data and straightforward goal create billions of possible solutions. Just imagine finding the needle of the lowest energy in that haystack of possible structures.
To narrow the search to more likely candidate structures, the team uses a parallel genetic algorithm called MGAC that they have developed over the previous six or seven years drawing on systems at NCSA and on TeraGrid.
"An exhaustive search is not feasible, so we have to have a way to direct the search," says Facelli. "[Genetic algorithms] are based on the principle of survival of the fittest. Trial solutions compete with one another in the population for survival and produce offspring for the next generation of tests. These algorithms offer excellent scaling properties, which make them good for large-scale parallel computing systems like those at NCSA and emerging computational grids like TeraGrid."
For example, an initial calculation on NCSA's Mercury may run 20 simulations on 20 different processors simultaneously, calculating possible crystal structures for a given set of atoms and their bonds. The energies for these structures are compared. The 10 with the lowest energies are kept, and the features of those 10 are mixed and matched to generate the structures for another 10 possible structures. Energies are calculated again, comparisons are made, best candidates are kept, and the cycle continues.
The "mating operation," as the mixing and matching is called, stagnates quickly, producing very similar structures over the course of thousands of generations. To combat this lack of variety, the genetic algorithm also introduces arbitrary mutations into the process, occasionally taking one variable from one of the best candidates and including a random number for that variable in the next generation.
From billions to hundreds
Even with a genetic algorithm greatly reducing the number of candidate structures and the amount of time it takes to find them, the researchers are still frequently left with more than a million possibilities. To narrow it down further, they find and remove the many structures that are the same physically but have very slightly different profiles numerically, "essentially getting rid of the rounding errors," Facelli says.
Next, they eliminate candidates that are the same structurally but that have revealed themselves in different orientations. This post-processing, which is done automatically at Utah's Center for High Performance Computing, eventually gets the number of candidates down to a couple of hundred.
"It's important to understand we're not so much trying to predict the exact structure. There are dynamic factors influencing the crystal growth, which means that the experimental structure might sometimes be higher than the lowest energy," Facelli says. Experts can compare those structures remaining and get it down to 10 or 20 that they want to test in the lab. "We take it from the billions of possibilities and say, 'Here are the most probable.'"
For more information, visit www.chpc.utah.edu/~facelli.
This research is supported by the National Science Foundation and the National Institutes of Health.
Team members
Victor Bazterra
Julio Facelli
Marta Ferraro
Matthew Thorley
-----
Source: NCSA Access Magazine, Fall 2007 -- http://www.ncsa.uiuc.edu/News/archive.html
While the Microsoft juggernaut has been touting the joys of its new Windows HPC Server 2008, the Linux HPC contingent has been somewhat less vocal of late. But now Red Hat has come up with its version of an integrated cluster solution.
Read More...
Even though the cost of servers still dominates the datacenter budget, storage is actually on a steeper growth curve. HPC storage, in particular, is being singled out as high-growth opportunity. Vendors are scrambling to keep up.
Read More...
Google datacenters most energy efficient; Cluster Resources to demo Moab Hybrid Cluster; Red Hat Linux releases HPC distro. John West recaps those stories and more in our weekly wrap-up.
Read More...
Oct 06 | The Register | Does the HP Oracle Database Machine represent InfiniBand's big chance to break out its HPC niche? Read more...
Oct 06 | BusinessWeek | A body scan can save a lot of time in the fitting room, and fields from medicine to architecture are adopting 3D computing applications. Read more...
Oct 03 | UCSD News | Despite the evolution of computer science over the past 30 years, structural engineering -- hindered by a reluctance to adapt to digital innovations -- has remained relatively unchanged as a discipline. Read more...
Oct 02 | New York Times | Silcon Valley is starting to feel the effects of the credit crunch. Read more...
Oct 01 | Data Center Knowledge | Google today disclosed details of its data center energy usage, confirming that it operates some of the most efficient facilities in the world. Read more...
Sep 04 | | Disk drives are approximately 250 times denser today than a decade ago. This is good news for users who are creating, manipulating and storing more data than ever before. It gives them an opportunity to derive more value from their stored data and lowers the capital acquisition and operating expense associated with that data.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.