Imagine a world in which genetic tests could identify which people were more likely to come down with serious illnesses. It might be frightening to hear that you were particularly susceptible to, say, Alzheimer’s disease. But what if doctors could forestall the disease’s onset by treating patients before they show any symptoms? Individuals whose genes indicate that they are at a high risk might begin treatments early enough to save themselves and their family members a great deal of suffering. Genotyping may help move us toward this better medical future.
Genotyping is a process that measures the genetic variation among members of a species. The most common type of variation between two individuals is the single nucleotide polymorphism (SNP), and SNPs can be linked to many human diseases. So by identifying genetic differences in a large population and then correlating individuals’ genetic makeup with their medical history, SNP genotyping may identify the genetic markers that signal a person’s likelihood to catch a particular illness.
Collaborating for the future of medicine
A genotyping project with the scope to enable this type of breakthrough is currently underway at the Institute for Human Genetics (IHG) at the
True to the university’s mission of “advancing healthcare worldwide™”, the research will benefit medical researchers beyond its own walls. “Not only are we going to generate this data set for our own analysis, but some of the data is going to be available to the scientific community at large,” explains Dispensa. “People can start looking for gene markers for diseases like Alzheimer’s or diabetes. We’re very excited, because nobody’s done anything like this before with the number of patients and the number of SNPs we’re including.”
Preparing for 7 petabytes of data
Pushing the envelope of scientific research usually requires cutting-edge technology. In the case of the Kaiser/UCSF project, that technology is the Axiom™ Genotyping Solution from Affymetrix, which comes with array plates that display genetic samples for analysis, a proprietary database of validated genomic markers, tools for array processing and Genotyping Console Software.
“The Axiom platform uses a new type of array plate that greatly increases the throughput,” says Dispensa. “The problem with being on the cutting edge is that you have to deal with very large data outputs. In our case, one 96-array plate, which accommodates saliva samples from 96 patients, contains about one terabyte of data. We’d need 1,042 plates to handle just our initial test size of 100,000 patients—so the complete project will involve more than 7 petabytes of data. We needed to figure out how we were going to be able to store and process all this information.”
Dell offers end-to-end solution
Dispensa and his colleagues undertook a broad search for the right hardware solution. “We went to every major vendor,” he says, “but in the end, the finalists were HP and Dell.” After hands-on demonstrations of both vendors’ equipment, the IHG implemented an all-Dell solution.
Sixteen Dell PowerEdge M610 blade servers with Intel Xeon processors 5500 series now sit in a Dell PowerEdge M1000e modular blade enclosure, supported by Dell EqualLogic PS6000XV and PS4000 series iSCSI storage arrays. A Dell PowerConnect 6248 Layer 3 switch provides one-gigabit Ethernet connections to the head controller node, a workstation sitting on top of the rack that issues commands for the slave nodes in the chassis, while a Dell PowerConnect M8024 Layer 3 switch provides 24 10-gigabit Ethernet ports to the blade chassis.
“We went with Dell blades running CentOS Linux partly because we like the way the fabric integration on the chassis works, and partly because the Integrated Dell Remote Access Controllers (iDRAC) are included at no charge with the server hardware,” says Dispensa. “HP’s Integrated Lights-Out (iLO) solution would have required us to pay a license fee for every component that we wanted to activate.”
The HP iLO licenses can cost up to $400 each and need to be renewed annually. “Lights-out management is essential because our experiment will be running 24×7 for two years,” says Dispensa. “I needed a way to make sure that I could interact with the equipment at the bare-metal level from anywhere in the world at any time.”
Dispensa is pleased to be saving both the $8,000 and the hassle involved in managing another licensing agreement. “Not adding to the number of licensing agreements we have to keep organized was very desirable,” he says.
No room for downtime
Another factor that was pivotal in the IHG’s selection of Dell was the need to eliminate any chance of downtime. Explains Dispensa: “This operation is running 24×7, 365, and it won’t stop until about two years from now. While it’s running, there’s absolutely no room for downtime. The Affymetrix machine is going to pump out data regardless of what’s happening with the storage solution.”
In the event of a SAN failure, the genotyping machine could store data locally for about an hour, but once its temporary storage cache filled up, it would shut down, bringing research to a halt. “That would be catastrophic,” says Dispensa. “There are eight plates in the machine at any time, and they would be ruined at a total cost of close to $250,000. That’s more than twice our IT budget for the whole project.”
Full redundancy in half the space—at half the cost
To minimize the risk of downtime, the IHG wanted a RAID 50 configuration to gain redundancy of the drives while maximizing storage space, plus it wanted to have fully redundant storage controllers in the SANs. The Dell EqualLogic arrays made this architecture possible.
“The storage solution was another major reason why we chose Dell,” says Dispensa. “If we had gone with HP LeftHand iSCSI storage, we wouldn’t have had room for both the RAID 50 configuration and the redundant controller in the same box. We would have had to buy twice the number of units. With Dell EqualLogic, we got an iSCSI storage solution that costs half as much and takes up half the floor space, which was very attractive.”
The Dell EqualLogic PS6000XV SAN uses SAS disk for high performance, while the EqualLogic PS4000-series array uses SATA disk for maximum capacity to store archived data. As the project continues and the volume of data it processes grows, the IHG expects to add more storage capacity to each.
Dispensa is pleased with the simplified scalability of Dell EqualLogic storage. “Let’s say I want to add another EqualLogic PS6000-series SAN,” he says. “Essentially, we just need to bolt it into the rack, wire it to the appropriate VLAN and make a few configuration changes, and we’re ready to go in about an hour. The storage pool dynamically increases. Being able to add storage without downtime or hassle is a huge benefit.”
Dell blades take on a supercomputer
Although the genotyping project has enormous data processing needs, the Dell solution is performing well, thanks in part to the Intel Xeon processors. “Genotyping involves looking at gigantic images,” says Dispensa. “Imagine an image on a billboard where the size and the intensity of every inch mean something. We need processors that can parse these massive image files, and the Intel Xeon 5500 architecture is perfect for that.”
In fact, between their processors and their RAM capacity, the 16 Dell PowerEdge M610 blades are nearly in the same league as UCSF’s supercomputer. The IHG didn’t use the supercomputer for this project because the volume of data it involves rendered movement of information across the university infrastructure impractical. “But we figured out that our 16 Dell blades in one blade enclosure and two EqualLogic storage arrays have about one-sixth of the amount of power of the entire supercomputer,” says Dispensa. “That’s a pretty bold statement considering that the supercomputer has hundreds of nodes.”
Simplified administration with Dell management tools
Dell server and storage management tools are simplifying administration for Dispensa and his colleagues. “Dell OpenManage is a really great solution,” he says. “And the iDRAC solution is great because it’s all-inclusive. It lets me control everything in the chassis from one single point of entry, then spider down to all the individual components, bringing hardware online and offline as desired.”
The IHG is using Dell EqualLogic SAN HeadQuarters (SAN HQ) software for centralized monitoring of the SANs’ performance. “It really is a great package,” Dispensa says. “It provides great metrics on what the storage infrastructure is doing at any given time, and it’s user-friendly for even the most novice administrators.”
Dell helped with the initial configuration and has remained a valuable resource to Dispensa and his team. “Our Dell account team has been very helpful,” he says. “This was an enormous undertaking, and meeting our budget and timeline were challenging, but with Dell’s help we met both. Dell has helped us figure out the most effective way to leverage the solution and get the performance we need.”
Ultimately, the performance of the Dell hardware and the support of the Dell account team have enabled the IHG to move forward with landmark research that could change the future of medicine around the world. “Dell really understood the impact that this project could have for the scientific community,” Dispensa concludes. “Dell’s reliable hardware and good advice are helping us move medical research forward.”
For more information go to: DellHPCSolutions.com