Hinxton, ENGLAND — Ben Hirschler reports that the gene machines at the Sanger Centre near Cambridge, responsible for mapping a third of the human genome, are still whirring night and day. Never mind that a working draft of the 3.1 billion letters in the “book of life” was completed on June 26. That is just the start of a coming tidal wave of genetic information.
Managing and interpreting the trillions of bits of data, or terabytes, generated by the genomics revolution is now the biggest task facing biologists – and a major business opportunity for computer companies.
“Informatics is going to be the key challenge in the future,” said Richard Durbin, Sanger deputy head and the man in charge of making the number-crunching work.
Sequencing the repeats of the four letters, A, C, T and G, representing the nucleotides that make up DNA in humans and other species, has already piled up more than 22 terabytes of data on the Sanger Centre’s hard disk drives. That is equivalent to more than twice the entire contents of the U.S. Library of Congress.
Durbin expects it to rise to 50-100 terabytes within two or three years as researchers investigate how the three-billion-piece “parts list” embedded in our chromosomes determines the way we develop, age and fall victim to disease.
Both hardware and software companies are queuing to service this exploding market, which is providing business opportunities well beyond academic institutions like the Sanger Centre, which is backed by the Wellcome Trust medical charity.
Biotechnology, pharmaceutical and agrochemical companies share the same requirement for supercomputing power, data storage and specialized software programs as they strive to apply genomic discoveries to new products.
That makes life science an increasingly enticing prospect for information technology (IT) companies – especially when sales of computers to other businesses are showing signs of slowing down.
International Business Machines Corp estimates the IT market for life sciences will surge from $3.5 billion this year to more than $9 billion by 2003, as the volume of life science data doubles every six months.
“Big Blue” is designing products specifically for this new market, including a $100 million investment in a supercomputer known as “Blue Gene,” designed to simulate protein folding.
Calculating the way a single human protein folds – a key determinant in its interaction with other molecules – is a deceptively simple task.
But the possible permutations are so vast that it will take the monster computer a year to do the calculation, even though it will be 1,000 times more powerful than “Deep Blue,” the IBM machine that beat world chess champion Gary Kasparov in 1997.
Other computer firms have also woken up to the potential of the market unleashed by the genome.
IBM’s arch-rival Compaq Computer Corp, the world’s biggest personal computer maker, is already a big supplier of servers to the life science sector – including the Sanger Centre.
In a bid to cultivate a new generation of clients, Compaq plans to invest $100 million in start-up biotech companies.
Martin Walker, head of high performance computing for Compaq in Europe, sees a huge opportunity in this field for his firm’s top-of-the-range servers, which sell at substantially higher margins than PCs.
Close to 10 percent of the aggregate research and development budget of the world’s 12 biggest life science companies is already being spent on IT, and the figure is set to rise exponentially, he predicts. “Biology is going to become the dominant application in all of computational science,” Walker told Reuters.
Alongside the computer industry’s big guns are a growing number of specialist bioinformatics companies, springing up to service the booming market with customized software.
They include Germany’s Lion Bioscience AG whose SRS, or Sequence Retrieval System, is starting to establish itself as a new industry standard. Shares in Lion have surged 50 percent since their August debut.
“It’s incredible how fast data is being put out in the public domain alone, and on top of that companies have proprietary databases which are also producing an incredible amount of data each day,” said Jan Mous, Lion’s chief scientific officer.
“The big challenge is to speed up analysis of that data so that you don’t have to wait half an hour to get the result back from a query.”
Other bioinformatics boutiques include U.S. firms Doubletwist, Genomica, Synomics and Inpharmatica, while integrated genomics companies like Incyte Genomics, Celera Genomics and Genaissance all offer differing levels of capability in the field.