April 6, 2011

University of Texas to House Largest Cancer Research Cloud

Nicole Hemsoth

The University of Texas has set about building its third data center, which officials expect will open later this year. Like the other two, this one will have the capacity to contain almost three petabytes—a perfect fit for the genomic research project it is slated to handle.

The university system is home to the MD Anderson Cancer Center, which, like other organizations untangling genomic-driven problems, generates and requires quick access to incredible amounts of data.

The center, which is tackling some cutting-edge work in the genomics-cancer arena, will require some significant number-crunching capabilities. IDG reported that the center will create the largest HPC cluster dedicated to cancer research in the world.

Lynn Vogel, CIO of MD Anderson in Houston states that this effort is being fueled by an incredibly large private cloud on the order of 8,000 processors and a half-dozen shared large memory machines with hundreds of terabytes of data storage attached.

As IDG reported, while the research center’s “general server infrastructure uses virtualization, the typical foundational technology for cloud, this specialized research environment doesn’t. Rather, the organization uses an AMD-based HPC cluster to underpin the research cloud.” In order to tap into the resource, they use a SOA-based web portal aptly named ResearchStation.

Vogel noted that currently, the 8,000-processor HPC sitting at the heart of the private cloud already is operating at 80-90% of capacity as did the setup that came before it, which weighed in at the 1,100-processor count. On the storage front it will make use of an HP-Ibrix system that supports extreme scale-out, he explained.

Interestingly, the group behind the research did briefly consider some public cloud alternatives but there were problems that extended beyond the usual suspect when dealing with patient data. According to Vogel, “we’ve found on performance, access and in the management of that data, going to a public cloud is more risky than we’re willing to entertain—and we’re just not comfortable with the cloud given the actionable capability of a patient should there be a breach.”

Vogel also noted that there is an important angle missing from public cloud providers in the way of understanding of the complexity of their data and goals. He says “As much as public cloud providers would like us all to believe, this is not just about dumping data into a big bucket and letting somebody else manage it.”

Full story at IDG