SCIENCE & ENGINEERING NEWS
San Diego, CALIF. — Alan Boyle reports for MSNBC that for decades, researchers have been puzzling over the mysteries of protein folding – the “machinery of life” that translates DNA’s instructions into action. Solving the mysteries could lead to new treatments for Alzheimer’s disease and cancer, but progress has come slowly. Now, programmers are planning to put thousands of Internet users on the case, using distributed computing and open-source concepts.
Experts agree that solving the protein-folding puzzle would represent a milestone in biotechnology. “It’s of fundamental importance for genomics,” said Sorin Istrail, senior director of informatics research at Celera Genomics, which was involved in decoding the human genome. “It’s the central understanding of the machinery of life.”
Living cells assemble amino acids into thousands of types of protein, to carry out tasks such as carrying oxygen through the bloodstream, flexing muscles and fighting infection. The molecules of each protein twist and fold automatically into just the right shape to do its work, based on complex chemical interactions. But sometimes those interactions can go haywire, resulting in conditions associated with a long list of diseases, including Alzheimer’s, mad-cow disease, cystic fibrosis and some forms of cancer. Thus, understanding the protein-folding process and how to keep it from going astray could save lives as well as unlock genetic secrets.
Scott Le Grand, a molecular biologist turned computer programmer, believes his team can succeed where others have gotten bogged down, by turning loose a screensaver program whimsically called “Folderol.”
“It’s a big problem,” he said. “It’s not solved yet. It hasn’t been solved in the 40 years since it was first discovered, and it looks like the computers are ready to solve this for us. We just have to come up with the right algorithm.”
Le Grand believes he has the right algorithm. Like other distributed computing projects, such as [email protected] and Distributed.Net, the Folderol software would let Internet users download scientific data, run it on their own computers using spare processing cycles, then send the results back to a central database. Folderol was released for public downloading early Friday, September 8.
Meanwhile, Entropia, a distributed-computing firm based in San Diego, is talking with researchers about putting its own Internet grid to work on the protein puzzle. “Imagine that, even while you were using your computer, 98 percent of those cycles while you were typing fell on the floor,” Jim Madsen, Entropia’s president and chief executive officer, told MSNBC.com. “Your PC could have been working on protein folding while you were writing this story.”
Could thousands of desktop computers succeed where supercomputers have failed? Istrail, who led a protein-folding simulation project at Sandia National Laboratories before moving over to Celera, said he’s “extremely skeptical.”
“When it comes to factoring numbers, everybody understands the problem,” he told MSNBC.com. “(But) protein folding is so complex. The best minds in this world have been working on this problem for 40 years, and we’re still somehow in mysterious territory.”
The problem is that the protein-folding process is so complex some scientists believe cracking the code just might be impossible. Figuring out all the possible permutations for a single protein would take billions of billions of years’ worth of brute-force calculations, by some estimates.
Folderol would take a different approach, said Le Grand, who wrote several papers on protein folding and edited a textbook on the subject during nine years of research. The program doesn’t check every possible permutation by brute force. Instead, it farms out data on a particular protein to run on multiple computers, and eventually compares the results from millions of parallel simulations. It would take roughly two to six hours for each user to complete work on a protein with 100 amino acids, Le Grand said. “If a million runs (of the simulation) run independently, and a thousand runs converge on what’s roughly the same thing, then that is the most likely confirmation,” he said.
Concentric circles dance across the screen while Folderol is running, as illustrated in this screenshot. The numbers on the left side of the screen show statistics about the target protein being analyzed. Folderol’s developers say the graphic look of the program will evolve as updated versions are released.
Le Grand and his colleagues say they’d like to let other computer users modify Folderol’s source code, as long as the code relating to data distribution can be protected.
“I would really like to see homebrew hackers get into this the same way they’ve gotten into prime factorization and encryption,” Le Grand said. “If I can provide them a code base that they can work with as building blocks, then I think I can get them involved.”
The Folderol team – which also includes mathematician-musician Stephanie Wukovitz and artist-engineer Doug Engel – already has some Internet cachet: The trio was involved in the creation of BattleSphere, a video game for the Atari Jaguar that has attracted a cult following. “A lot of the algorithms for a 3-D video game are the same algorithms that one would use to simulate the folding of proteins, so there was a natural overlap,” Le Grand said. The Folderol team plans to analyze the same protein data that’s used for the Critical Assessment of Techniques for Protein Structure Prediction, a biennial gathering where researchers gauge how much progress they’ve made on the protein puzzle. That should provide a good opportunity for judging Folderol’s success.
Several companies already have been built around the application of distributed computing to medicine and biotechnology. Entropia offers a range of team projects that participants in 83 countries can sign up for. “If someone who was dear to you had Alzheimer’s disease, and we were working on some research on ways to stall the progress of that disease, that’s a valuable thing to offer,” explained Tim Cusac, the company’s senior marketing analyst. Scott Kurowski, vice president for business development, indicated that Entropia’s smorgasbord could include protein analysis.
“It’s conceivable the genetic algorithm approach could be applied to this,” he said. “We’ve implemented similar kinds of technologies in biotechnology solutions.” Such a project could offer a way to compare different strategies for simulating protein processes. “Rather than hoping to identify the ultimate algorithm, you can run several of them and determine, based on the result, which is the most appropriate to use,” Cusac said. Entropia’s executives said comparing families of proteins may be a better approach to the problem than trying to analyze each and every protein. Researchers estimate that millions of proteins can be found in nature, grouped into just 5,000 families that share similar structures.
Istrail, meanwhile, said more attention should be devoted to identifying and classifying proteins. “Computing will get you only so far,” he said. “What we need is to understand the principles.” He said much more data would be needed to start figuring out the principles of protein folding. “What will be a tremendous help is an industrial as opposed to a piecemeal approach to the problem. There are 6,000 or 7,000 structures in the database … that’s not enough,” he said. Istrail would like to see 50,000 structures entered into the Protein Data Bank. “But how do you get to them?” he asked. “I think the area is in a big deadlock. We need a phase transition to a new environment for research.” For more information visit http://www.folderol.org/ .