by Uwe Harms
In 1999, Compaq created a Bioinformatics Expertise Center in Marlboro, Massachusetts to better support customers and business partners in the industry. Compaq’s Cambridge, Massachusetts Research Laboratory also began a focus on bioinformatics, contributing to the optimization of applications performance and the development of data mining algorithms for genetic data. Compaq’s most recent contribution to the Human Genome Project is a cluster of AlphaServer ES40 systems with 100 CPUs and a terabyte of storage (located at Compaq’s Enterprise Systems Lab in Littleton, Massachusetts), which is being made available to the research institutions to complete the annotation of the human genome. There Genoscope ran their first analysis of the complete draft of the human genome.
In 1997, France decided to rejoin the group of countries which had initiated large scale sequencing by creating a “Groupement d’InterÍt Public”: the Genoscope (Centre National de Sequencage), for which the official notice of creation appeared in France’s “Journal Officiel” on January 1st 1997. It is a non profit organization located at Evry, France, and owns the second largest sequencing facility in Europe.
The informatics is based on a high-performance and high-availability UNIX computing environment. In a very early stage, Genoscope chose Compaq Alphaservers and Compaq StorageWorks as the basis for its IT architecture. The center compared all systems available on the market. The main decision criteria have been: resiliency, compute power and storage scalability.
The actual calculation system is a cluster of 4 Digital quadriprocessor (AXP 21264 (EV6) with 525 MHz) computers (Compaq GS60, ex AS 8200) with 4 GB of memory each. The peak performance sums up to 17 GFlop/s. The main stocking system is a set of disk racks with a total capacity of about 1 TByte (1000 GB). Backup functions are ensured by a 330 cartridges robot (DLT7000 tape drives), with a 35 to 70 GB capacity for each cartridge.
The suitability of their choice was confirmed by the fact that the other major gene sequencing centres, the Sanger Centre in the UK, MIT Whitehead Institute in the US and the two largest private sequencing companies, Celera Genomics and Incyte Genomics Inc, also chose Alphaservers and StorageWorks.
Genoscope played an important part in the international Human Genome Project (HGP), completely sequencing the long arm of the chromosome 14 to 99% accuracy, and targeting 99,99%. The recent completion and publishing of the first completed draft of the human genome by the HGP is one of the most important scientific achievements ever.
Now, for the first time anywhere in the world and with the collaboration of Compaq, Genoscope ran the first analysis of the complete draft of the human genome. The results are currently being scientifically analyzed at Genoscope and will provide highly accurate prediction of the total number of genes in the human genome. The analysis used the LASSAP (Large Scale Sequence compArison Package) code – a sequence comparison software package. The job ran on the Alphaserver cluster at Compaq’s facility in Littleton, Mass., mentioned above. The cluster is a 25-node cluster composed of Alphaserver EV67 ES40s, each with 4 CPUs. LASSAP runs routinely in multi threaded (4 Cpus) and message passing mode (16 Cpus) on Genoscope’s own Alphaservers.
The complete analysis run of the whole draft dataset took only 38 hours on the 100 CPUs. The Alphaserver cluster needed 25% less time to complete a run 2.5 times larger than all previous runs made on any system available from any vendor.
Overall system performance was 2.5 times greater than all the competitive systems. Monsieur Jean-Jacques Codani, CEO of Gene-IT, the authors of LASSAP, said that he was very impressed by such a scalability, absolutely linear and equal to the factor of the number of processors. Gene-IT is a bioinformatics company formed from INRIA, the French National Institute for Research in Computer Science and Control, and involved in the estimation of human gene number.
Gene-IT continues to improve the LASSAP code on the Alpha processor. With the support of specialists from the Compaq HPTC Solution Centre in Annecy, France, the code will be optimized. Further improvements will be made, to maintain Compaq’s position in delivering solutions in Computational Biology.
——- Uwe Harms is a supercomputing consultant and owner of Harms-Supercomputing-Consulting in Munich, Germany.
============================================================