October 19, 2007

Summit Spotlights Biomedical High Performance Computing

by Charles Coleman, Ph.D. & Cheryl Doninger

HPC and grid computing — and the promises and challenges for biomedical applications — were the focus of the first annual Biomedical High Performance Computing Leadership Summit on October 1-2, hosted by Harvard Medical School in the Medical School’s Rotunda in Boston.

The content was presented by a broad cross-section of researchers and computing experts from both public and private entities. Their presentations touched on several recurring themes and conclusions:

  • Distributed and parallel file systems are a critical component to successful HPC architecture and applications.
  • Interest and research into usage of virtualization technology and “in silico” simulations are accelerating at breakneck speed: need to define appropriate usage scenarios as well as how to support and manage.
  • Data, data, and more data: by 2010, IDC estimates that the amount of digital data worldwide will be doubling every 11 hours — get ready for yottabytes.
  • Trends: there is a movement from 1U compute resources to blades.
  • Sophisticated data management and analytics tools will be in increasing demand.
  • Accessibility versus security and the role of institutional research governance remains an issue.
  • The users/customers of the HPC infrastructure rule.
  • The roles of open architecture, open standards, and open source are still open questions.
  • Storage, vast storage, persistent storage and on-demand archival retrieval remain vast and persistent requirements.
  • Still the $64,000 question: establishing costs and ROIs for HPC and grid environments: the “big payback”?
  • Individual researchers, especially, need to get over their Gollum-like obsessive hoarding of their (“My Precious”) data — (“It’s Mine!”).

Approximately 150 participants from a wide variety of public and private entities attended, with more than eighteen presenters and three keynotes. This event was a great success, as evidenced by more than half of the attendees rating this Summit a 100 percent “valuable-use-of-their-time” in the post-conference survey, while more than two-thirds claimed the learning experience would “definitely be useful in their future work.”

The opening address, by Dr. Philip Papadopoulos (Program Director of Grid and Cluster Computing, University of California, San Diego Supercomputing Center), focused the audience on OS Virtualization and its Impact on Science and Cyber Environments, followed by Dr. Phil Andrews’ (University of Tennessee/Oak Ridge National Laboratory) lively discussion regarding TeraGrid Technologies and Applications and the rapidly growing role of simulation and managing massive data sets. Dr. Wolfgang Gentzsch (Germany’s D-Grid Coordinator) covered the expanding European gird collaborations and reported on “Lessons Learned” regarding Building and Maintaining Large-Scale Grid Infrastructures: despite them being cumbersome, costly, and politically charged, Dr. Gentzsch’s report was positive and optimistic — the EU is making significant strides in developing and establishing grid computing at the university and research levels.
Dr. Brian Athey’s candid reflections on the reality of federating and growing high performance computing and data environments to support research at the University of Michigan Medical School were greeted with a mixture of sobriety and introspection. His graphic analysis of the amount of time, energy and money required to actually achieve a fully-integrated, federated HPC environment for medical research was immensely helpful in citing the “got-yas” in rolling out highly complex and visible clinical and research computing networks.

Four speakers from the private sector — Dr. John Hurley, Boeing; Dr. Mark Linesch, HP; Chris Dagdigian from Bioteam, Inc.; and Cheryl Doninger, SAS Institute — presented surprisingly varying views of how HPC and grid were being embraced and deployed in their corporations for either internal use or for product development. Despite the obvious differences between these industries, there were actually several common goals for leveraging and implementing an HPC solution including the need to manage information and data, share infrastructure across multiple users and applications, and collaborate with suppliers and partners to solve common problems.

Chris Dagdigian, a brains-for-hire HPC/grid consultant, enacted a spirited real-world revival of Michael Caine’s “A Bridge Too Far” gap-analysis of vision versus reality in implementing grid infrastructure, notably on the hardware and storage fronts. Chris also stressed the impact on the environment via the cost of cooling large grid and HPC implementations.
Juxtaposed against Chris’ true-grit “Trends from the Trenches” was Scott Collins’ (Manager of Scientific Computing and SW Engineering, Janelia Farm Research Campus, Howard Hughes Medical Institute “HHMI”) portrayal of The Janelia Farm Information Infrastructure for HPC at HHMI. Here, and elsewhere in the conference, presenters such as Scott momentarily waxed poetical in their depiction of a field-of-dreams approach (“build it and then we’ll solve it”) to harnessing HPC and grid infrastructure to solve world-class computing problems. Scott’s rendering of Janelia Farm was as bucolic as it was inspirational, and The Farm is in an enviable position to prove what can be done with HPC and grid environments in medical science.

A surprise visit and animated presentation by Dr. Zak Kohane quickly surfaced the subliminal notion of translational medicine and CTSA by focusing in on the role of HPC and grid computing to help dramatically close the gap in bench-to-clinician-to-patient information and communication. Here, the use of HPC computational infrastructure to really impact the quality of healthcare through the availability of evidence gleaned and reported in real time can and does save lives. Zak’s one slide on Rofecoxib (Vioxx, Ceoxx, Ceeoxx) told the whole story of how clinical metrics, measurement and reporting across multiple clinical datasets are invaluable in improving quality of care and how “industrial strength” computing environments, if managed well, can dramatically improve clinical care with the right data mining, clinical performance “intelligence” and data management tools.

An evening keynote by Dr. John Halamka (CIO Harvard Medical School and Beth Israel Deaconess Medical Center and Chair of the Health Information Technology Standards Panel “HITSP”) helped to transpose the Summit’s content into a real-time bioinformatics and healthcare perspective in his presentation entitled “Emergence and Convergence: National Health Information Standards, Personal Genomes and Shared High Performance Computing.”

Themes and best-practices for managing shared computing infrastructure for creating flexible clusters and grids specifically for the sciences were traversed by Dr. Jay Boisseau (University of Texas Austin, Texas Advanced Computing Center), Dr. Mark Ellisman (National Center for Microscopy and Imaging Research and Founder of BIRN), Dr. Rick Stevens (Argonne National Laboratory and the University of Chicago), and Mary Kratz (University of Michigan).

Some specific HPC and grid-ready applications were demonstrated as well, including a joint presentation by Dr. Peter Westfall (Texas Tech University) and Cheryl Doninger (Director, R&D, Grid Computing, SAS) on clinical trials simulation. This application leverages an HPC environment to process large data sets and complex algorithms for clinical trials simulation to achieve time-savings while reducing the cost of a clinical trial by millions of dollars.

Administrative and technical staff from North Carolina State University table-topped a singularly cutting-edge Virtual High-Performance Computing Platform supporting their campus-wide Virtual Computing Environment that allows students and faculty to access and “run” the numerous software applications — notably in science, technology, engineering and math — they need using any browser to dynamically spawn a remote, real-time customized computing cloud over the Internet. NC State students and faculty can select applications and saved data sets from a library of proprietary and open source images and run them on Linux, Solaris and numerous Windows environments 24 X 7 X 365 at the click of a mouse from anywhere in the world, and without downloading anything to their individual laptops, desktops or laboratory workstations. Truly, Software-as-a-Service and an SOA has been established in higher education within a major engineering school here in the United States by leveraging the advantages of HPC on a daily basis. Put simply, this is a grid that works every minute of every day.

Another interesting feature of this event was the ability for any of the attendees to submit polling questions that were presented to the audience for voting before each break. Following the results of the survey questions, a series of trivia questions kept the mood light and the attendees engaged. (The winner donated his cash prize to Children’s Hospital.)

Marcos Athanasoulis, Director of Research Information Technology, Harvard Medical School, was the summit’s Planning Committee Chair and Host. His opening remarks were poignant and humorous: “Oh behalf of Harvard University and the Medical School, may I welcome you to The Land of 1000 CIOs.” The logistics were fabulous, the networking exceptional, the food organic and gourmet, and the quality of the presentations and attendees top-notch. Plan now on attending next year’s Biomedical HPC Leadership Summit! Potential 2008 presenters should contact Marcos at marcos@hms.harvard.edu.


(c) Copyright Charles Coleman, PhD, and Cheryl Doninger, 2007

Cheryl Doninger mailto:cheryl.doninger@sas.com and Charles Coleman, PhD mailto:charles.coleman@sas.com are employees of SAS Institute www.sas.com/grid in Cary, North Carolina, and contribute articles and content in the field of grid, high-performance, and bio-medical computing. The opinions stated here are expressly those of the authors and do not represent the opinions of SAS Institute or Tabor Communications.

Share This