One of the highlights of SC14 was a focus on how HPC is expanding out of its roots and cropping up in more and more places. One of the more interesting use cases to take center stage at this year’s event in New Orleans was the significance of HPC to understanding and decoding the bread wheat genome. In the session, “Beyond Human – Sequencing the Complex Wheat Genome to Advance Global Food Security,” Tim Stitt, Head of Scientific Computing at The Genome Analysis Center (TGAC), spoke about the Institute’s work sequencing and assembling one of the most complex genomes, the bread wheat genome.
The wheat genome is especially large, about five times the size of the human genome, which has led some to joke that perhaps that means it is five times smarter than humans.
For about three years now, the TGAC team has been relying on an SGI UV 2000 cluster equipped with Intel Xeon processors to provide its researchers with the power to sequence crop genomes. This helps them identify the complex traits responsible for qualities such as yield, pest and drought resistance, considered critical in the face of increased population density and uncertain climate patterns.
In a short video describing the group’s work, Richard Leggett, QC & Primary Analysis Project leader, observes “without high-performance computing, we would not be able to do anything we do here. Genome analysis is just not possible on desktop machines, we need desktop computing to analyze the massive data that we get out of the sequences.”
The sequencing platforms at TGAC generate up to 2-4 TB of data in a week. It comes off the instruments and gets stored the SGI UV 2000 cluster. The installation has 2,560 Intel cores and 20 TB of coherent main memory for in-memory computing in a single-image system. “This allows us to increase the computational tasks we can handle at one time,” says Stitt, “and the simple programming model allows us to scale up as scientific demand rises.”
The wheat genome is about five times more complex than the human genome with wheat having about 17 billion characters of DNA and human’s having approximately three billion. The most common sequencing technology can only read about 100 to 300 of those letters at a time, hence the need to divide the genome into smaller segments. “Comparing billions of small sequences with the assembled genome requires enormous computing power and that’s why HPC is so important,” says Leggett.
“Our main goal is to understand crop genomes, particularly the bread wheat genome, so we can secure UK’s food supply in the face of environmental changes and growing population,” notes Stitt.
The team reports having sequenced and assembled 17 out of the 21 wheat chromosomes.