Visit additional Tabor Communication Publications
July 09, 2012
by James Coffin, Ph.D., vice president and general manager, Dell Healthcare and Life Sciences
The transformation of healthcare from episodic to true personalized care is being met with both optimism and the realities of a system that does not take into account a patient’s full health record (both genomic and non-genomic attributes) or the need to collaborate effectively to coordinate care. With the advent of new high-performance computing (HPC) technologies and genomic tools, we are entering an era where healthcare professionals can make more informed decisions on clinical care. While high-throughput research platforms, like next-generation sequencing (NGS), allow researchers to investigate genome-wide variations in genetic markers between normal and diseased tissue, they also create a new problem: the management, sharing and analysis of massive amounts of genomic data related to a patient needed by healthcare professionals to improve diagnosis and treatment. A fresh approach is needed to bridge the gap between clinical research and practice in order to build a more complete picture of disease and treatment strategies and allow healthcare professionals to share knowledge with other experts to determine the best course of care and improve outcomes.
A collaboration between the Translational Genomics Research Institute (TGen) and the Neuroblastoma and Medulloblastoma Translational Research Consortium (NMTRC) is underway on the world’s first FDA-approved personalized medicine clinical trial for pediatric cancer. TGen will use its genomic technology to help NMTRC identify a greater depth of personalized treatment strategies for children with neuroblastoma who are enrolled in the trial, which brings together scientific and medical partners from all across the country. Crucial to its success is an information technology (IT) platform that supports collaboration among the participating clinical sites to create the knowledge base critical for targeted care.
In conjunction with Dell and its partners, the organizations have built a best-in-class, HPC and cloud-based IT infrastructure designed to accelerate genetic analysis and identification of targeted treatments for patients. As part of the infrastructure, trial-specific portals and a high-speed, grid-based architecture are being implemented to facilitate the rapid transfer of genomic and relevant clinical data between collaborators in the trial. This will help facilitate the integration of genomic data into the studies to build a unique medical profile for each patient that will allow clinicians to predict which of the available therapeutic drugs will be most effective. The goals of the collaboration are long-term object storage, quick data transfer between sites and transparency for everyone from patients to bioinformaticians to scientists to trial administrators.
The Big Data Challenge
The initial focus of the project was to tackle the “big data” challenge faced by the organizations performing the NGS experiments. The raw data generated by these instruments is extremely large and, in the case of TGen, is doubling every six months. The data objects are complex files with important metadata about the samples and instruments themselves and can be up to 3TB in size and require significant processing resources to collect, manage and analyze. With TGen managing up to 200TB in genomic data per patient, it was important to develop an IT strategy to allow them to analyze these massive files quickly and affordably. After all, the children enrolled in these studies require a quick turnaround to give them the best chance in fighting their disease.
To overcome these challenges, TGen replaced a legacy Dell PowerEdge C2100 system with a cluster of Dell PowerEdge M710HD blade servers. The blades, which run CentOS Linux, are housed in three Dell M1000e modular blade enclosures. Dell Force10 C300 and S4810 10-Gigabit switches provide connectivity for the cluster’s 800 cores. All told, the cluster’s maximum performance is eight teraflops, but despite the dramatic improvement in processing power, the HPC cluster has a small footprint—with three-fold more cores packed into the same floor space—and reduced energy consumption as the blades use 25 percent less power per core than the legacy servers.
For data storage, TGen is building a multi-tier solution that combines multiple technologies as part of the Dell Fluid Data architecture. The technology implemented in this case keeps the data available for researchers and clinicians to collaborate on care while at the same time making it easy to manage and back up to archive. The storage architecture includes a high-performance file system for high-speed, parallel file access, plus Dell Compellent storage in support of more traditional applications, such as Microsoft SQL Server databases and laboratory file sharing. For back-up and archiving, TGen is leveraging the Dell DX Object Storage Platform. The DX platform is especially important because the cost per terabyte makes it affordable to store large amounts of data, scaling well into the petabytes, while allowing TGen’s researchers to use their advanced algorithms to mine these large data sets.
The next phase of the implementation is addressing the challenge of long-distance communication. As part of this clinical trial, TGen must partner on research projects with many different professionals from organizations around the world. In addition to patients and their families, the trial involves many clinicians, researchers and pathologists. Patient samples are collected and dissected by biologists, geneticists apply the latest genomics technology to the samples, and bioinformaticians mine the data. Add in the supporting biostatisticians, computer scientists and software engineers and it is critical to create a high-throughput environment that everyone can use as targeted treatments are being developed. TGen and Dell are developing a cloud-based collaboration system to facilitate such interactions. The cloud-based platform provides a virtual library of data that can be accessed by researchers and allow data to be checked out and analyzed using HPC capabilities. This enables fluid integration between premise-based capabilities and virtual capabilities (the cloud) and is providing the framework to move the data seamlessly through the research lifecycle, protect it, and make it available for future use. In addition, the system includes a high-performance, grid-based architecture to move the massive amounts of genomic data around quickly and securely. Data can be ingested at various sites, moved to the cloud and then made available for analysis either in premise-based HPC environments or any HPC cloud environment. Another important feature of the IT strategy was to localize data near HPC capacity both in the cloud and on-premise to speed analysis and validation.
The HPC environment and first phase of the cloud infrastructure is already yielding significant benefits. The project has increased TGen’s gene sequencing and analysis capacity by 1,200 percent and improved collaboration between physicians, genetic researchers, pharmacists and computer scientists involved in the clinical trial. This cloud infrastructure and portal technology is designed to efficiently manage the volume and complexity of that data while making it secure and accessible to many. For this personalized medicine trial to be successful, doctors and researchers will need the ability to interpret their patient’s genomic information into useful knowledge for targeted care both quickly and affordably. With TGen translational knowledge and the Dell high-performance cloud technology, researchers have accelerated the analysis of patient-specific genomic data from several days down to one day, resulting in a significant improvement in time to targeted treatment. For patients with neuroblastoma, this literally means the difference between life and death.
James Coffin, Ph.D.
Vice President and General Manager, Dell Healthcare and Life Sciences
As vice president and general manager of Dell Healthcare and Life Sciences, James Coffin leads teams in developing the latest innovative information technology solutions and services for healthcare, building the partner ecosystem and driving Dell’s thought leadership in healthcare. Prior to joining Dell, Coffin spent more than 12 years at IBM, where he held a variety of leadership positions. Prior to joining IBM, he was considered a leader in the application of computational chemistry techniques and high-performance computing to real world chemical and biological problems. Coffin holds a Ph.D. in physical chemistry from the University of Arkansas and a Bachelor of Science degree from Louisiana Tech. He studied at Cambridge University as a Cambridge Fulbright Postdoctoral Fellow and was a member of the scientific staff of the National Center for Supercomputing Applications at the University of Illinois. He lectures worldwide on innovation in the field of electronic medical records, personalized medicine, high-performance computing and leading edge in silico techniques to accelerate drug discovery.
In quieter times, sounding the bell of funding big science with big systems tends to resonate further than when ears are already burning with sour economic and national security news. For exascale's future, however, the time could be ripe to instill some sense of urgency....
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.