Visit additional Tabor Communication Publications
September 13, 2011
Health care analytics is an emerging application area that promises to help cut costs and provide better patient outcomes. To reach that goal though requires sophisticated software that can mimic some of the intelligence of real live physicians. At Lund University and Skåne University in Sweden, researchers are attempting to do just that by building a model of heart-transplant recipients and donors to improve survival times.
The so-called "survival model" is designed to discover the optimal matches between recipients and donor for heart transplants. It takes into account such factors as age, blood type (both donor and recipient), weight, gender, age, and time during a transplant when there is no blood flow to the heart. Just analyzing those six variables leads to about 30,000 distinct combinations to track. When you want to match tens of thousands of recipients and donors across that spread of combinations, you need a rather sophisticated software model and some serious computing horsepower.
To build the application, the Lund researchers used MATLAB and a set of related MathWorks libraries, namely the Neural Network Toolbox, the Parallel Computing Toolbox, and the MATLAB Distributed Computing Server. With that, they built their predictive artificial neural network (ANN) models, in this case, a simulation that predicts survival rates for heart transplant patients based on the suitability of the donor match. The ANN models are "trained" using donor and recipient data encapsulated in two databases: the International Society for Heart and Lung Transplantation (ISHLT) registry and the Nordic Thoracic Transplantation Database (NTTD).
The key software technology for the ANN application is MathWorks' Neural Network Toolbox. The package contains tools for designing and simulating neural networks, which can be used for artificial intelligence-type applications such as pattern recognition, quantum chemistry, speech recognition, game-playing and process control. These types of application don't lend themselves easily to the type of formal analysis done in traditional computing.
For the ANN models, training involves correlating donor and recipient data, such that the risk factors are weighted accurately. If done correctly, the simulations can become adept at associating these factors with the heart transplant survival rates. In this case, the results from the simulations were used to pick out the best and worst donors for any particular recipient.
The ultimate goal is to determine the mean survival times after transplantation for waiting recipients, so that doctors can make the best possible decision with regard to matches. In the research study, they analyzed about 10,000 patients that had already received transplants in order to verify the accuracy of the algorithms.
What they found was that the ANN models could increase the five-year survival rate raised by 5 to 10 percent compared to the traditional selection criteria performed by practicing physicians. Perhaps more importantly, using a randomized trial based on preliminary results, approximately 20 percent more patients would be considered for transplantation under these models, says Dr. Johan Nilsson, Associate Professor in the Division of Cardiothoracic Surgery at Lund University.
Because of the combinatorial load of the recipient-donor variables, the models are very compute-intensive. On a relative small cluster, the MATLAB-derived ANN simulation took about five days. That was significantly better the open source software packages (R and Python) they started out with. Under that environment, runs took about three to four weeks and were beset with crashes and inaccurate results.
To run the simulation, the researchers used a nine-node Apple Xserve cluster (which includes a head node and a filesharing node), along with 16 TB of disk, all lashed to together with a vanilla GigE network. Memory size on the nodes ranged form 24 to 48 GB. According to Nilsson, with the latest MATLAB configuration, they use 64 CPUs to run the ANN simulation.
Nilsson, who is a physician, programmed the application himself, noting that the MATLAB environment was easy to set up and use, adding there was no need for deep knowledge of parallel computing. The biggest roadblock he encountered was the need to customize an error function (MATLAB Neural Network does not have any cross-entropy error routine.) There were also some problems encountered in setting up the Xserve cluster, but once they replaced Apple's Xgrid protocol with the MATLAB Distributed Computing Server, many of those problems disappeared.
The Apple Xserve cluster is not exactly state of the art for high performance computing these day. Presumably with a late model HPC setup, they could cut the five-day turnaround time for the simulation even more, which would speed up the research even further.
In the short term, the Lund and Skåne team intend to continue to optimize the software and explore other solutions like regression tree and logistic regression algorithms, as well as add support for vector machines. In parallel, they want to start transitioning the technology into a clinical setting.
According to Nilsson, once they've fully cooked the models, they can do away with the high performance computing environment. "In a future clinical setting," he says, "the application could be used on any desktop computer, and the matching process will take only seconds to a couple of minutes."
Jun 18, 2013 |
The world's largest supercomputers, like Tianhe-2, are great at traditional, compute-intensive HPC workloads, such as simulating atomic decay or modeling tornados. But data-intensive applications--such as mining big data sets for connections--is a different sort of workload, and runs best on a different sort of computer.
Jun 18, 2013 |
Researchers are finding innovative uses for Gordon, the 285 teraflop supercomputer housed at the San Diego Supercomputer Center (SDSC) that has a unique Flash-based storage system. Since going online, researchers have put the incredibly fast I/O to use on a wide variety of workloads, ranging from chemistry to political science.
Jun 17, 2013 |
The advent of low-power mobile processors and cloud delivery models is changing the economics of computing. But just as an economy car is good at different things than a full size truck, an HPC workload still has certain computing demands that neither the fastest smartphone nor the most elastic cloud cluster can fulfill.
Jun 14, 2013 |
For all the progress we've made in IT over the last 50 years, there's one area of life that has steadfastly eluded the grasp of computers: understanding human language. Now, researchers at the Texas Advanced Computing Center (TACC) are utilizing a Hadoop cluster on its Longhorn supercomputer to move the state of the art of language processing a little bit further.
Jun 13, 2013 |
Titan, the Cray XK7 at the Oak Ridge National Lab that debuted last fall as the fastest supercomputer in the world with 17.59 petaflops of sustained computing power, will rely on its previous LINPACK test for the upcoming edition of the Top 500 list.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?
Join our webinar to learn how IT managers can migrate to a more resilient, flexible and scalable solution that grows with the data center. Mellanox VMS is future-proof, efficient and brings significant CAPEX and OPEX savings. The VMS is available today.