Health care analytics is an emerging application area that promises to help cut costs and provide better patient outcomes. To reach that goal though requires sophisticated software that can mimic some of the intelligence of real live physicians. At Lund University and Skåne University in Sweden, researchers are attempting to do just that by building a model of heart-transplant recipients and donors to improve survival times.
The so-called “survival model” is designed to discover the optimal matches between recipients and donor for heart transplants. It takes into account such factors as age, blood type (both donor and recipient), weight, gender, age, and time during a transplant when there is no blood flow to the heart. Just analyzing those six variables leads to about 30,000 distinct combinations to track. When you want to match tens of thousands of recipients and donors across that spread of combinations, you need a rather sophisticated software model and some serious computing horsepower.
To build the application, the Lund researchers used MATLAB and a set of related MathWorks libraries, namely the Neural Network Toolbox, the Parallel Computing Toolbox, and the MATLAB Distributed Computing Server. With that, they built their predictive artificial neural network (ANN) models, in this case, a simulation that predicts survival rates for heart transplant patients based on the suitability of the donor match. The ANN models are “trained” using donor and recipient data encapsulated in two databases: the International Society for Heart and Lung Transplantation (ISHLT) registry and the Nordic Thoracic Transplantation Database (NTTD).
The key software technology for the ANN application is MathWorks’ Neural Network Toolbox. The package contains tools for designing and simulating neural networks, which can be used for artificial intelligence-type applications such as pattern recognition, quantum chemistry, speech recognition, game-playing and process control. These types of application don’t lend themselves easily to the type of formal analysis done in traditional computing.
For the ANN models, training involves correlating donor and recipient data, such that the risk factors are weighted accurately. If done correctly, the simulations can become adept at associating these factors with the heart transplant survival rates. In this case, the results from the simulations were used to pick out the best and worst donors for any particular recipient.
The ultimate goal is to determine the mean survival times after transplantation for waiting recipients, so that doctors can make the best possible decision with regard to matches. In the research study, they analyzed about 10,000 patients that had already received transplants in order to verify the accuracy of the algorithms.
What they found was that the ANN models could increase the five-year survival rate raised by 5 to 10 percent compared to the traditional selection criteria performed by practicing physicians. Perhaps more importantly, using a randomized trial based on preliminary results, approximately 20 percent more patients would be considered for transplantation under these models, says Dr. Johan Nilsson, Associate Professor in the Division of Cardiothoracic Surgery at Lund University.
Because of the combinatorial load of the recipient-donor variables, the models are very compute-intensive. On a relative small cluster, the MATLAB-derived ANN simulation took about five days. That was significantly better the open source software packages (R and Python) they started out with. Under that environment, runs took about three to four weeks and were beset with crashes and inaccurate results.
To run the simulation, the researchers used a nine-node Apple Xserve cluster (which includes a head node and a filesharing node), along with 16 TB of disk, all lashed to together with a vanilla GigE network. Memory size on the nodes ranged form 24 to 48 GB. According to Nilsson, with the latest MATLAB configuration, they use 64 CPUs to run the ANN simulation.
Nilsson, who is a physician, programmed the application himself, noting that the MATLAB environment was easy to set up and use, adding there was no need for deep knowledge of parallel computing. The biggest roadblock he encountered was the need to customize an error function (MATLAB Neural Network does not have any cross-entropy error routine.) There were also some problems encountered in setting up the Xserve cluster, but once they replaced Apple’s Xgrid protocol with the MATLAB Distributed Computing Server, many of those problems disappeared.
The Apple Xserve cluster is not exactly state of the art for high performance computing these day. Presumably with a late model HPC setup, they could cut the five-day turnaround time for the simulation even more, which would speed up the research even further.
In the short term, the Lund and Skåne team intend to continue to optimize the software and explore other solutions like regression tree and logistic regression algorithms, as well as add support for vector machines. In parallel, they want to start transitioning the technology into a clinical setting.
According to Nilsson, once they’ve fully cooked the models, they can do away with the high performance computing environment. “In a future clinical setting,” he says, “the application could be used on any desktop computer, and the matching process will take only seconds to a couple of minutes.”