HPC and grid computing — and the promises and challenges for biomedical applications — were the focus of the first annual Biomedical High Performance Computing Leadership Summit on Oct. 1-2, which was hosted by Harvard Medical School in the Medical School’s Rotunda in Boston.
The content was presented by a broad cross-section of researchers and computing experts from both public and private entities. Their presentations touched on several recurring themes and conclusions:
- Distributed and parallel file systems are a critical component to successful HPC architecture and applications.
- Interest and research into usage of virtualization technology and “in silico” simulations are accelerating at breakneck speed: Need to define appropriate usage scenarios as well as how to support and manage.
- ·Data, data and more data: IDC estimates that by 2010 the amount of digital data worldwide will be doubling every 11 hours — get ready for yottabytes.
- Trends: There is a movement from 1U compute resources to blades.
- Sophisticated data management and analytics tools will be in increasing demand.
- Accessibility versus security and the role of institutional research governance remains an issue.
- The users/customers of the HPC infrastructure rule.
- The roles of open architecture, open standards and open source are still open questions.
- Storage, vast storage, persistent storage and on-demand archival retrieval remain vast and persistent requirements.
- Still the $64,000 question: Establishing costs and ROI for HPC and grid environments. The “big payback”?
- Individual researchers, especially, need to get over their Gollum-like obsessive hoarding of their (“My Precious”) data (“It’s mine!”).
Approximately 150 participants from a wide variety of public and private entities attended, with more than 18 presenters and three keynotes. This event was a great success, as evidenced by more than half of the attendees rating this summit a 100 percent “valuable-use-of-their-time” in the post-conference survey, while more than two-thirds claimed the learning experience would “definitely be useful in their future work.”
The opening address, by Dr. Philip Papadopoulos (program director of grid and cluster computing, San Diego Supercomputing Center), focused the audience on “OS Virtualization and its Impact on Science and Cyber Environments,” followed by Dr. Phil Andrews’ (University of Tennessee/Oak Ridge National Laboratory) lively discussion regarding “TeraGrid Technologies and Applications” and the rapidly growing role of simulation and managing massive data sets. Dr. Wolfgang Gentzsch (coordinator, D-Grid, Germany) covered the expanding European grid collaborations and reported on “Lessons Learned” regarding “Building and Maintaining Large-Scale Grid Infrastructures.” Despite them being cumbersome, costly and politically charged, Gentzsch’s report was positive and optimistic — the EU is making significant strides in developing and establishing grid computing at the university and research levels.
Dr. Brian Athey’s candid reflections on the reality of federating and growing high-performance computing and data environments to support research at the University of Michigan Medical School were greeted with a mix of sobriety and introspection. His graphic analysis of the amount of time, energy and money required to actually achieve a fully integrated, federated HPC environment for medical research was immensely helpful in citing the “gotchas” in rolling out highly complex and visible clinical and research computing networks.
Four speakers from the private sector — Dr. John Hurley (Boeing), Dr. Mark Linesch (HP), Chris Dagdigian (Bioteam Inc.) and Cheryl Doninger (SAS Institute) — presented surprisingly varying views of how HPC and grid were being embraced and deployed in their corporations for either internal use or for product development. Despite the obvious differences between these industries, there were actually several common goals for leveraging and implementing an HPC solution, including the needs to manage information and data, share infrastructure across multiple users and applications, and collaborate with suppliers and partners to solve common problems.
Dagdigian, a brains-for-hire HPC/grid consultant, enacted a spirited real-world revival of Michael Caine’s “A Bridge Too Far” gap-analysis of vision versus reality in implementing grid infrastructure, notably on the hardware and storage fronts. He also stressed the impact on the environment via the cost of cooling large grid and HPC implementations.
Juxtaposed against Dagdigian’s true-grit “Trends from the Trenches” was Scott Collins’ (manager of scientific computing and software engineering, Janelia Farm Research Campus, Howard Hughes Medical Institute (HHMI)) portrayal of The Janelia Farm Information Infrastructure for HPC at HHMI. Here, and elsewhere in the conference, presenters momentarily waxed poetic in their depictions of a “Field of Dreams” approach (“build it and then we’ll solve it”) to harnessing HPC and grid infrastructure to solve world-class computing problems. Collins’ rendering of Janelia Farm was as bucolic as it was inspirational, and the Farm is in an enviable position to prove what can be done with HPC and grid environments in medical science.
A surprise visit and animated presentation by Dr. Zak Kohane quickly surfaced the subliminal notion of translational medicine and CTSA by focusing in on the role of HPC and grid computing to help dramatically close the gap in bench-to-clinician-to-patient information and communication. Here, the use of HPC computational infrastructure to really impact the quality of health care through the availability of evidence gleaned and reported in real time can and does save lives. Kohane’s one slide on Rofecoxib (Vioxx, Ceoxx, Ceeoxx) told the whole story of how clinical metrics, measurement and reporting across multiple clinical datasets are invaluable in improving quality of care, and how industrial-strength computing environments, if managed well, can dramatically improve clinical care with the right data mining, clinical performance “intelligence” and data management tools.
An evening keynote by Dr. John Halamka (CIO, Harvard Medical School and Beth Israel Deaconess Medical Center; chair, Health Information Technology Standards Panel) helped to transpose the summit’s content into a real-time bioinformatics and health care perspective in his presentation entitled “Emergence and Convergence: National Health Information Standards, Personal Genomes and Shared High Performance Computing.”
Themes and best practices for managing shared computing infrastructure for creating flexible clusters and grids specifically for the sciences were traversed by Dr. Jay Boisseau (Texas Advanced Computing Center), Dr. Mark Ellisman (National Center for Microscopy and Imaging Research; founder, BIRN), Dr. Rick Stevens (Argonne National Laboratory and the University of Chicago), and Mary Kratz (University of Michigan).
Some specific HPC and grid-ready applications were demonstrated as well, including a joint presentation on clinical trials simulation by Dr. Peter Westfall (Texas Tech University) and SAS’s Doninger. This application leverages an HPC environment to process large data sets and complex algorithms for clinical trials simulation to achieve time savings while reducing the cost of a clinical trial by millions of dollars.
Administrative and technical staff from North Carolina State University table-topped a singularly cutting-edge virtual high-performance computing platform supporting its campus-wide Virtual Computing Environment. The environment allows students and faculty to, using any browser, dynamically spawn a remote, real-time customized computing cloud over the Internet to access and “run” the numerous software applications they need — notably in science, technology, engineering and math. Twenty-four hours a day, seven days a week, 365 days a year, and at the click of a mouse, NCSU students and faculty can select applications and saved data sets from a library of proprietary and open source images and run them on Linux, Solaris and numerous Windows environments from anywhere in the world, without downloading anything to their individual laptops, desktops or laboratory workstations. Leveraging the advantages of HPC, software-as-a-service and an SOA truly have been established in higher education within a major engineering school here in the United States. Put simply, this is a grid that works every minute of every day.
Another interesting feature of this event was the ability for any of the attendees to submit polling questions that were presented to the audience for voting before each break. Following the results of the survey questions, a series of trivia questions kept the mood light and the attendees engaged. (The winner donated his cash prize to Children’s Hospital.) Marcos Athanasoulis, director of research information technology for Harvard Medical School, was the summit’s planning committee chair and host. His opening remarks were poignant and humorous: “Oh behalf of Harvard University and the Medical School, may I welcome you to the ‘Land of a Thousand CIOs.’”
The logistics were fabulous, the networking exceptional, the food organic and gourmet, and the quality of the presentations and attendees top-notch: Plan now on attending next year’s Biomedical HPC Leadership Summit. Potential 2008 presenters should contact Athanasoulis at [email protected].
About the Authors
Cheryl Doninger and Charles Coleman are employees of SAS Institute in Cary, N.C., and contribute articles and content in the fields of grid, high-performance and bio-medical computing. The opinions stated here are expressly those of the authors and do not represent the opinions of SAS Institute or Tabor Communications.
(c) Copyright Charles Coleman, PhD, and Cheryl Doninger, 2007