The UCSF Institute for Human Genetics analyzes 7 petabytes of data with help from Dell

By Nicole Hemsoth

January 7, 2011

Imagine a world in which genetic tests could identify which people were more likely to come down with serious illnesses. It might be frightening to hear that you were particularly susceptible to, say, Alzheimer’s disease. But what if doctors could forestall the disease’s onset by treating patients before they show any symptoms? Individuals whose genes indicate that they are at a high risk might begin treatments early enough to save themselves and their family members a great deal of suffering. Genotyping may help move us toward this better medical future.

Genotyping is a process that measures the genetic variation among members of a species. The most common type of variation between two individuals is the single nucleotide polymorphism (SNP), and SNPs can be linked to many human diseases. So by identifying genetic differences in a large population and then correlating individuals’ genetic makeup with their medical history, SNP genotyping may identify the genetic markers that signal a person’s likelihood to catch a particular illness.

Collaborating for the future of medicine

A genotyping project with the scope to enable this type of breakthrough is currently underway at the Institute for Human Genetics (IHG) at the University of California, San Francisco (UCSF). The project is a collaboration between the IHG and Kaiser Permanente. “Kaiser has been recruiting from its patient population and gaining voluntary consent from individuals who want to participate in the study,” says Brad Dispensa, director of IT and information security at the IHG and the UCSF Center for Cerebrovascular Research. “Patients who consent to be genotyped provide a saliva sample. Kaiser has already recruited 100,000 patients, which is a pretty astronomical feat for this type of study. Ultimately we plan to include 700,000 individuals in the research, and we’re looking at variations in 700,000 different SNPs.”

True to the university’s mission of “advancing healthcare worldwide™”, the research will benefit medical researchers beyond its own walls. “Not only are we going to generate this data set for our own analysis, but some of the data is going to be available to the scientific community at large,” explains Dispensa. “People can start looking for gene markers for diseases like Alzheimer’s or diabetes. We’re very excited, because nobody’s done anything like this before with the number of patients and the number of SNPs we’re including.”

Preparing for 7 petabytes of data

Pushing the envelope of scientific research usually requires cutting-edge technology. In the case of the Kaiser/UCSF project, that technology is the Axiom™ Genotyping Solution from Affymetrix, which comes with array plates that display genetic samples for analysis, a proprietary database of validated genomic markers, tools for array processing and Genotyping Console Software.

“The Axiom platform uses a new type of array plate that greatly increases the throughput,” says Dispensa. “The problem with being on the cutting edge is that you have to deal with very large data outputs. In our case, one 96-array plate, which accommodates saliva samples from 96 patients, contains about one terabyte of data. We’d need 1,042 plates to handle just our initial test size of 100,000 patients—so the complete project will involve more than 7 petabytes of data. We needed to figure out how we were going to be able to store and process all this information.”

Dell offers end-to-end solution

Dispensa and his colleagues undertook a broad search for the right hardware solution. “We went to every major vendor,” he says, “but in the end, the finalists were HP and Dell.” After hands-on demonstrations of both vendors’ equipment, the IHG implemented an all-Dell solution.

Sixteen Dell PowerEdge M610 blade servers with Intel Xeon processors 5500 series now sit in a Dell PowerEdge M1000e modular blade enclosure, supported by Dell EqualLogic PS6000XV and PS4000 series iSCSI storage arrays. A Dell PowerConnect 6248 Layer 3 switch provides one-gigabit Ethernet connections to the head controller node, a workstation sitting on top of the rack that issues commands for the slave nodes in the chassis, while a Dell PowerConnect M8024 Layer 3 switch provides 24 10-gigabit Ethernet ports to the blade chassis.

“We went with Dell blades running CentOS Linux partly because we like the way the fabric integration on the chassis works, and partly because the Integrated Dell Remote Access Controllers (iDRAC) are included at no charge with the server hardware,” says Dispensa. “HP’s Integrated Lights-Out (iLO) solution would have required us to pay a license fee for every component that we wanted to activate.”

The HP iLO licenses can cost up to $400 each and need to be renewed annually. “Lights-out management is essential because our experiment will be running 24×7 for two years,” says Dispensa. “I needed a way to make sure that I could interact with the equipment at the bare-metal level from anywhere in the world at any time.”

Dispensa is pleased to be saving both the $8,000 and the hassle involved in managing another licensing agreement. “Not adding to the number of licensing agreements we have to keep organized was very desirable,” he says.

No room for downtime

Another factor that was pivotal in the IHG’s selection of Dell was the need to eliminate any chance of downtime. Explains Dispensa: “This operation is running 24×7, 365, and it won’t stop until about two years from now. While it’s running, there’s absolutely no room for downtime. The Affymetrix machine is going to pump out data regardless of what’s happening with the storage solution.”

In the event of a SAN failure, the genotyping machine could store data locally for about an hour, but once its temporary storage cache filled up, it would shut down, bringing research to a halt. “That would be catastrophic,” says Dispensa. “There are eight plates in the machine at any time, and they would be ruined at a total cost of close to $250,000. That’s more than twice our IT budget for the whole project.”

Full redundancy in half the space—at half the cost

To minimize the risk of downtime, the IHG wanted a RAID 50 configuration to gain redundancy of the drives while maximizing storage space, plus it wanted to have fully redundant storage controllers in the SANs. The Dell EqualLogic arrays made this architecture possible.

“The storage solution was another major reason why we chose Dell,” says Dispensa. “If we had gone with HP LeftHand iSCSI storage, we wouldn’t have had room for both the RAID 50 configuration and the redundant controller in the same box. We would have had to buy twice the number of units. With Dell EqualLogic, we got an iSCSI storage solution that costs half as much and takes up half the floor space, which was very attractive.”

The Dell EqualLogic PS6000XV SAN uses SAS disk for high performance, while the EqualLogic PS4000-series array uses SATA disk for maximum capacity to store archived data. As the project continues and the volume of data it processes grows, the IHG expects to add more storage capacity to each.

Dispensa is pleased with the simplified scalability of Dell EqualLogic storage. “Let’s say I want to add another EqualLogic PS6000-series SAN,” he says. “Essentially, we just need to bolt it into the rack, wire it to the appropriate VLAN and make a few configuration changes, and we’re ready to go in about an hour. The storage pool dynamically increases. Being able to add storage without downtime or hassle is a huge benefit.”

Dell blades take on a supercomputer

Although the genotyping project has enormous data processing needs, the Dell solution is performing well, thanks in part to the Intel Xeon processors. “Genotyping involves looking at gigantic images,” says Dispensa. “Imagine an image on a billboard where the size and the intensity of every inch mean something. We need processors that can parse these massive image files, and the Intel Xeon 5500 architecture is perfect for that.”

In fact, between their processors and their RAM capacity, the 16 Dell PowerEdge M610 blades are nearly in the same league as UCSF’s supercomputer. The IHG didn’t use the supercomputer for this project because the volume of data it involves rendered movement of information across the university infrastructure impractical. “But we figured out that our 16 Dell blades in one blade enclosure and two EqualLogic storage arrays have about one-sixth of the amount of power of the entire supercomputer,” says Dispensa. “That’s a pretty bold statement considering that the supercomputer has hundreds of nodes.”

Simplified administration with Dell management tools

Dell server and storage management tools are simplifying administration for Dispensa and his colleagues. “Dell OpenManage is a really great solution,” he says. “And the iDRAC solution is great because it’s all-inclusive. It lets me control everything in the chassis from one single point of entry, then spider down to all the individual components, bringing hardware online and offline as desired.”

The IHG is using Dell EqualLogic SAN HeadQuarters (SAN HQ) software for centralized monitoring of the SANs’ performance. “It really is a great package,” Dispensa says. “It provides great metrics on what the storage infrastructure is doing at any given time, and it’s user-friendly for even the most novice administrators.”

Dell helped with the initial configuration and has remained a valuable resource to Dispensa and his team. “Our Dell account team has been very helpful,” he says. “This was an enormous undertaking, and meeting our budget and timeline were challenging, but with Dell’s help we met both. Dell has helped us figure out the most effective way to leverage the solution and get the performance we need.”

Ultimately, the performance of the Dell hardware and the support of the Dell account team have enabled the IHG to move forward with landmark research that could change the future of medicine around the world. “Dell really understood the impact that this project could have for the scientific community,” Dispensa concludes. “Dell’s reliable hardware and good advice are helping us move medical research forward.”

For more information go to: DellHPCSolutions.com

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable quantum memory framework. “This work provides a promising Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire