by Christopher Lazou, HiPerCom Consultants, Ltd.
San Diego, CALIF. — Over a year ago IBM announced the Blue Gene project, promising to develop a Petaflop/s computer to study protein structures. This machine is a hundred times more powerful than the present 10 Teraflop/s computer installed at Lawrence Livermore Laboratories, USA, and also developed by IBM. To achieve Petaflop/s performance a million processors are needed working in tandem without failures solving one problem, namely, the direct simulation of a protein structure. This is unchartered territory; how to get the hardware working reliably is a gargantuan engineering task and modelling the science part is at least as challenging.
Dr. Dennis Newns (from IBM’s Computational Biology Group at the T.J. Watson Laboratory, New York) visited the University of Cambridge, UK, 19th October. He was touting for scientific collaboration especially with scientists at the Sanger centre, (named after the British researcher Frederick Sanger – 1980 Nobel prizewinner for his 1977 sequencing technique work), responsible for the Human Genome Project (HGP), in the UK. Newns gave a seminar with the title: “The Blue Gene Petaflop Supercomputer Project, early milestones and Science Challenges”
The lecture used as a starting point the Human Genome Project and the new research avenues which it opened. One of these is the study of protein structures. This can take two forms, data mining of the DNA mapping plus experiments or more daringly a frontal attack by direct computer simulation. This includes the simulation of both ion channels and protein functions as for example membrane transmissions. This is an exciting new development in biotechnology with enormous lucrative business potential. Note that protein mulfolding is highly toxic to life, one example of this is the mad cow disease, which devastated the beef industry in the UK.
When one looks at the computational aspects of protein folding using a free energy funnel, which allows dealing with a small section of space rather than all configurations, it still requires ten to the power of fifteen instructions per second to perform a realistic simulation, hence the birth of the Blue Gene Project which has at its heart a Petaflop/s computer.
IBM is not known to have super fast processors, only Power 3 and next year Power 4, which are an order of magnitude slower than the proprietary processors produced by Japanese vendors, for example, NEC with their SX5 processor technology adapted and used in the 40 Teraflop/s Japanese Earth Simulator, so how does IBM hope to deliver a Petaflop/s machine with this type of technology in the next four years?
According to Dennis Newns, the design of the processor has been more or less completed and likely to be frozen in the next two to three months. The current design envisages a special processor with a constraint instruction set, (57 instructions to the normal 256 plus) and limited 4 Mbit DRAM memory on the chip. Each chip will house 32 processors, and in addition to the DRAM memory it will have a small amount of fast SRAM memory for data staging allowing for two instructions per clock cycle. Each processor thread will have a 2nanosecond latency but since there are 8 threads running in parallel the latency will be amortised so that each chip will have a 32Gflop/s peak performance. Even with this performance on a chip one needs 32 thousand chips to get a Petaflop/s rate. You will need a very large computer room full of computer racks and at least 2 MWatts of electrical power supply.
The architecture chosen uses a cube with a 1Gbit link, reminiscent to the INMOS transputer. It has 1GByte bandwidth which according to preliminary simulations should be sufficient for this particular protein folding application.
There is no reason to doubt that Blue Gene will be built, the question is how to keep system integrity with 32 thousand chips. This is no mean feat since any chip failure will require connection re-routing and re- balance of atoms. One proposal is to mirror the calculations and also perform frequent check points comparing the results for every time step.
Assuming that the system failure rate is infrequent so that it remains stable enough to get results, how much of the peak will be translated into sustain performance is currently any ones guess. IBM is of course currently doing simulations which should tell whether the chip will work or not, it is the size of the machine which is the biggest unknown. Note that for N chips theory requires communication speeds to increase NlogN to keep pace so the communication bottleneck will reduce performance at least an order of magnitude unless some way is found to amortise this.
IBM and some of their collaborators in various universities have been working on smaller problems to establish whether protein structure stability is sensitive to force field and whether folding rate depends on topological complexity of fold. The results from the few simulations on folding dynamics of small peptides to-date are very positive.
At present to check stability of fold they use umbrella sampling calculations for force fields and even this restricted method for a 36 residue protein with ten to the power of eight time steps required 3 months of dedicated computing on a 256 Node Cray T3E.
The Blue Gene project expect to improve on this, folding an 80 residue protein with ten to the power eleven time steps in 3 months. The insights gained in understanding the mechanisms controlling biosystems has a great potential for the design of a plethora of new products spanning the agriculture industry, the life sciences and biomedical technology.
Finally, Dr. Newns, stated that the Genome project has open an enormous new field in biotechnology and in 50 years from now those alive will view current research in the same light as we view a UNIVAC computer of the late 1950’s and compare it to present Teraflop/s machines. It is also the fastest growth business around with lucrative opportunities for computer vendors to deliver the essential modelling systems, for designing the new biotechnology based products.
A number of companies are already actively involved, such as, Celera Genomics founded by Dr. Venter, raising fears and fanning an ethical debate about the “ownership” of humanity’s genetic heritage.
In Europe, in addition to HGP participants many genomic research projects received support from the EU. For example, the Quality of Life programme, funds genomic research concerning human complaints, such as cancer, infectious diseases, inherited deafness, autism, muscular dystrophy and so on. Other projects focus on genomic tools for developing diagnosis and treatment methods.
============================================================