Cancer Research: A Supercomputing Perspective

By Aaron Dubrow

May 31, 2017

Cancer, the second-leading cause of death in the U.S. after heart disease, kills more than 500,000 citizens per year, including about 2,000 children.

In 2016, then Vice President Joe Biden launched the Cancer Moonshot, saying: “I know that we can help solidify a genuine global commitment to end cancer as we know it today —  and inspire a new generation of scientists to pursue new discoveries and the bounds of human endeavor.”

The importance of high performance computing (HPC) in cancer research was recognized by the Cancer Moonshot Task Force report, and by then Vice President Joe Biden and Energy Secretary Ernie Monitz.

“Supercomputers are key to the Cancer Moonshot,” Monitz wrote. “These exceptionally high-powered machines have the potential to greatly accelerate the development of cancer therapies by finding patterns in massive datasets too large for human analysis. Supercomputers can help us better understand the complexity of cancer development, identify novel and effective treatments, and help elucidate patterns in vast and complex data sets that advance our understanding of cancer.”

With complex, non-linear signaling networks, multiscale dynamics from the quantum to the macro level, and giant, complex datasets of patient responses, cancer is quite possibly the ultimate in HPC problems.

“What could be more complicated and more important?” said J. Tinsley Oden, a computational researcher at The University of Texas at Austin applying uncertainty quantification to cancer treatment predictions. “At each step, it has the most complex features. It is really a garden of rich, important problems that are in the path of many of the developments that we’ve been working on for years.”

Infographic depicts TACC’s multi-domain approach to fighting cancer — click to expand

Hundreds of oncologists, biologists and computer scientists use the HPC systems at the Texas Advanced Computing Center (TACC) to understand the fundamental nature of cancer biology and to improve cancer treatments. Their work addresses a range of cancers types and treatment modalities, and spans applied or fundamental research.

Though diverse in their specific targets, the approaches they use can be loosely grouped into seven broad methodologies: molecular simulation; bioinformatics; mathematical modeling; computational treatment planning; quantum calculation; clinical trial design; and machine learning. The following sections describe and provide examples of each.

Molecular Simulations

Simulating protein and drug interactions at the molecular level enables scientists to understand the mechanics of cancer to design more effective treatments.

For Rommie Amaro, professor of Chemistry and Biochemistry at the University of California, San Diego, this means uncovering new pockets in tumor protein 53 (p53) — “the guardian of the genome” — which plays a crucial role in conserving the stability of DNA and preventing mutations.

The model of full-length p53 protein bound to DNA as a tetramer. The surface of each p53 monomer is depicted with a different color. [Courtesy: Özlem Demir, University of California, San Diego]
In approximately 50 percent of all human cancers, p53 is mutated and rendered inactive, therefore, reactivating mutant p53 using small molecules has been a long-sought-after anticancer therapeutic strategy.

In September 2016, writing in the journal Oncogene, Amaro reported results of the largest atomic-level simulation of the p53 to date — comprising more than 1.5 million atoms. The simulations, enabled by the Stampede supercomputer at TACC, helped identify new binding sites on the surface of the protein that could potentially reactivate p53.

“When most people think about cancer research they probably don’t think about computers,” she said. “But biophysical models are getting to the point where they have a great impact on the science.”

Virtual drug screening is another important HPC application for cancer research. Shuxing Zhang, professor of experimental therapeutics at MD Anderson Cancer Center, used molecule simulations on TACC’s Lonestar5 system to screen 1,448 Food and Drug Administration-approved small molecule drugs to determine which had the molecular features needed to bind and inhibit TNIK — an enzyme that plays a key role in cell signaling in colon cancer.

Zhang discovered that mebendazole, an FDA-approved drug that fights parasites, could effectively bind to TNIK and inhibit its enzymatic activity. He reported his results in Nature Scientific Reports in September 2016.

“Such advantages render the possibility of quickly translating the discovery into a clinical setting for cancer treatment in the near future,” Zhang wrote.

Bioinformatics

The human genome consists of three billion base pairs, so identifying single mutations by sight simply isn’t possible. For that reason, the field of bioinformatics — which uses computing and software to identify patterns and differences in biological data — has been an enormous boon for cancer researchers.

But bioinformatics is more than simple, one-to-one pattern matching.

A heat map showing differences in gene expression between primary tumors and cultured cell lines. Each row is a gene and each column is a tumor or cell sample. In the heat map, red indicates high expression and blue indicates low expression. NHA refers to normal human astrocytes, a star-shaped glial cell of the central nervous system. [Courtesy: Amelia Weber Hall, Iyer lab]
“When you move into multi-dimensional, time-series, or population-level studies, the algorithms can get a lot more computationally intensive,” said Matt Vaughn, TACC’s Director of Life Sciences Computing. “This requires resources like those at TACC, which help large numbers of researchers explore the complexity of cancer genomes by providing elastic, large-scale computing capability.”

For Vishy Iyer, a molecular biologist at The University of Texas at Austin (UT Austin), and his collaborators, access to TACC’s Stampede supercomputer helps them mine reams of data from The Cancer Genome Atlas to identify genetic variants and subtle correlations that affect gene expression in tumors.

“TACC has been vital to our analysis of cancer genomics data, both for providing the necessary computational power and the security needed for handling sensitive patient genomic datasets,” Iyer said.

In February 2016, Iyer and a team of researchers from UT Austin and MD Anderson Cancer Center reported in Nature Communications on a genome-wide transcriptome analysis of the two types of cells that make up the prostate gland. They identified cell-type-specific gene signatures that were associated with aggressive subtypes of prostate cancer and adverse clinical responses.

“This knowledge can be helpful in the development of more targeted therapies that seek to eliminate cancer at its origin,” Iyer said.

Using a similar methodology, Iyer and a team of researchers from UT Austin and the National Cancer Institute identified a transcription factor associated with an aggressive type of lymphoma that is highly correlated with poor therapeutic outcomes. They published their results in the Proceedings of the National Academy of Sciences in January 2016.

Whereas Iyer, an experienced HPC user, develops custom tools for his analyses, a much larger number of researchers access Stampede and comparable systems through scientific gateways. One prominent gateway is Galaxy, an open source bioinformatics platform that serves 30,000 researchers and runs more than 3,000 compute jobs a day.

Since 2014, TACC has powered the data analyses for a large percentage of Galaxy users, allowing researchers to solve tough problems in cases where their personal computer or campus cluster is not sufficient. Of those researchers, a significant subset use the site to analyze cancer genomes.

“Galaxy can be used to identify tumor mutations that drive cancer growth, find proteins that are overexpressed in a tumor, as well as for chemo-informatics and drug discovery,” said Jeremy Goecks, Assistant Professor of Biomedical Engineering and Computational Biology at Oregon Health and Science University and one of Galaxy’s principal investigators.

Goecks estimates that hundreds of researchers each year use the platform for cancer research, himself included. Because cancer patient data is closely protected, the bulk of this usage involves either publically available cancer data, or data on cancer cell lines – immortalized cells that reproduce in the lab and are used to study how cancer reacts to different drugs or conditions.

“This is an ideal marriage of TACC having tremendous computing power with scalable architecture and Galaxy coming along and saying, we’re going to go the last mile and make sure that people who can’t normally use this hardware are able to.”

Mathematical Modeling

While some researchers believe bioinformatics will rapidly advance the understanding and treatment of cancer, others think a better approach is to mathematize cancer: to uncover the fundamental formulas that represent how cancer, in its varied forms, behaves.

At the Center for Computational Oncology at UT Austin, researchers are developing complex computer models to predict how cancer will progress in a specific individual.

Each factor involved in the tumor response — whether it is the speed with which chemotherapeutic drugs reach the tissue or the degree to which cells signal each other to grow — is characterized by a mathematical equation that captures its essence. These models are combined and parameterized and initialized with patient-specific data.

In April 2017, writing in the Journal of The Royal Society Interface, Thomas Yankeelov and collaborators at UT Austin and Vanderbilt University, showed that they can predict how brain tumors (gliomas) will grow in mice with greater accuracy than previous models by including factors like the mechanical forces acting on the cells and the tumor’s cellular heterogeneity.

To develop and implement their mathematically complex models, the center’s scientists use TACC’s supercomputers, which enable them to solve bigger problems that they otherwise could and reach solutions far faster.

Recently, the group has begun a clinical study to predict, after one treatment, how an individual’s cancer will progress, and use those predictions to plan the future course of treatment.

“There are not enough resources or patients to sort this problem out because there are too many variables. It would take until the end of time,” Yankeelov said. “But if you have a model that can recapitulate how tumors grow and respond to therapy, then it becomes a classic engineering optimization problem. ‘I have this much drug and this much time. What’s the best way to give it to minimize the number of tumor cells for the longest amount of time?’”

Computing at TACC helps Yankeelov accelerate his research. “We can solve problems in a few minutes that would take us three weeks to do using the resources at our old institution,” he said. “It’s phenomenal.”

Quantum Calculations

X-ray radiation is the most frequently used form of radiation therapy, but a new treatment is emerging that uses a beam of protons to kill cancer cells with minimum damage on surrounding tissues.

“As happens in cancer therapy, we know empirically that it works, but we don’t know why,” said Jorge A. Morales, a professor of chemistry at Texas Tech University and a leading proponent of the computational analysis of proton therapy. “To do experiments with human subjects is dangerous, so the best way is through computer simulation.”

Computational experiments can mimic the dynamics of the proton-cell interactions without causing damage to a patient and can reveal what happens when the proton beam and cells collide from start to finish, with atomic-level accuracy. Morales has been simulating proton-cell chemical reactions using quantum dynamics models on TACC’s Stampede supercomputer to investigate the fundamentals of the process.

His studies, reported in PLOS One in March 2017, as well as in Molecular Physics, and Chemical Physics Letters (2015 and 2014 respectively), have determined the basic byproducts of protons colliding with water within the cell, and with nucleotides and clusters of DNA bases – the basic units of DNA. The studies shed light on how the protons and their water radiolysis products damage DNA.

Though fundamental in nature, the insights and data that Morales’ simulations produce help researchers understand proton cancer therapy at the quantum level, and help modulate factors like dosage and beam direction.

“These simulations will bring about a unique way to understand and control proton cancer therapy that, at a very low cost, will help to drastically improve the treatment of cancer patients without risking human subjects,” Morales said.

Computational Treatment Planning

Wei Liu, a researcher at the Mayo Clinic, also studies proton therapy, but he looks at the treatment from a clinical perspective.

In comparison with current radiation procedures, proton therapy saves healthy tissue in front of and behind the tumor. It is particularly effective when irradiating tumors near sensitive organs where stray beams can be particularly damaging.

However, the pinpoint accuracy required by the protein beam, which is its greatest advantage, means that it must be precisely calibrated and that discrepancies from the ideal (whether from device, human error or even patient breathing) must be taken into consideration.

Writing in Medical Physics in January 2017, Liu and his collaborators showed that their “chance-constrained model” was better at sparing organs at risk than current methods.

“Each time, we try to mathematically generate a good plan,” he said. “There are 25,000 variables or more, so generating a plan that is robust to these mistakes and can still get the proper dose distribution to the tumor is a large-scale optimization problem.”

The researchers used the Lonestar5 supercomputer at TACC to generate treatment plans that minimize the risk and uncertainties involved in proton beam therapy.

“It’s very computationally expensive to generate a plan in a reasonable timeframe,” he continued. “Without a supercomputer, we can do nothing.”

Computational Trial Design

Another way researchers use TACC’s advanced computers is to design clinical trials that can better determine which combination of dosages will be most effective, specifically for the biological agents used in immunotherapy, which work very differently from chemotherapy and radiation.

Writing in the Journal of the Royal Statistics Society Series C (Applied Statistics), Chunyan Cai, assistant professor of biostatistics at McGovern Medical School at The University of Texas Health Science Center at Houston (UTHealth) described her efforts using Lonestar5 to identify biologically optimal dose combinations for agents that target the PI3K/AKT/mTOR signaling pathway, which has been associated with several genetic aberrations related to the promotion of cancer.

Scanning electron micrograph of a human T lymphocyte (also called a T cell) from the immune system of a healthy donor. Immunotherapy fights cancer by supercharging the immune system’s natural defenses (include T-cells) or contributing additional immune elements that can help the body kill cancer cells. HPC is helping researchers better understand how immunotherapeutic agents can be used effectively [Courtesy: NIAID]
“Our research is motivated by a drug combination trial at MD Anderson Cancer Center for patients diagnosed with relapsed lymphoma,” Cai said. “The trial combined two novel biological agents that target two different components in the PI3K/AKT/mTOR signaling pathway.”

They investigated six different dose-toxicity and dose-efficacy scenarios and carried out 2,000 simulated trials for each of the designs.

Based on those simulations, she concluded that “the design proposed has desirable operating characteristics in identifying the biologically optimal dose combination under various patterns of dose–toxicity and dose–efficacy relationships.”

The research is leading to new, safer and more effective ways to test combinations of immunotherapeutic agents.

Machine Learning

A final, and truly radical, way that researchers are using HPC for cancer research is through the application of machine and deep learning.

The Eberlin research group at UT Austin develops clinical applications of ambient mass spectrometry for cancer diagnosis. They create tools and techniques to assist surgeons in distinguishing between normal and cancer tissue during tumor resection operations.

To do so, they have had to develop statistical methods that can analyze and interpret large amount of mass spectrometry data gathered from clinical samples.

Jonathan Young, a post-doctoral research in the group, is building machine learning classifiers to reliably predict whether a given tissue sample is cancer or normal, and if it is indeed cancer, which specific subtype the tumor belongs to.

Young uses the Maverick system at TACC, which contains a large number of NVIDIA GPUs, to develop and implement the machine learning algorithms. “The large memory capacity of Maverick is well suited for our extensive datasets, and the parallelization capability will aid in parameter sweeps during the training of classifiers,” Young said.

Young will present his work at the American Society for Mass Spectrometry (ASMS) Annual Conference this June.

Another example of the application of machine learning to cancer can be found in the work of Daniel Lobo, an assistant professor of biology and computer science at the University of Maryland, Baltimore County (UMBC). He is using machine learning to map out the cellular communication networks that underlie cancer, and to design methods to disrupt them.

In their January 2017 paper in Scientific Reports, Lobo and collaborators showed that machine learning can uncover the cellular networks that determine pigmentation in tadpoles and reverse-engineering never-before-seen coloration. Their work was facilitated by Stampede, which enabled the team to run billions of simulations to identify models of the cellular network and the means of altering it.

Lobo’s lab is applying the method to cancer research to determine what type of interventions might stop metastasis in its tracks without damaging other cells.

“Traditional approaches like chemotherapy attack the cells that grow the most, but leave cells that are signaling others to grow and that may be the most important,” Lobo says. “We’re using machine learning to find out the communication networks between these cells and hopefully to discover a treatment that can cause the tumor to collapse.”

“Getting a true understanding, given the complexity of the information, without some assistance from machine learning, is probably hopeless,” said Michael Levin, Lobo’s collaborator. “I think it’s inevitable that we use machine learning to enrich scientific and biomedical discovery.”

From patient-specific treatments to immunology to drug discovery, advanced computing is accelerating the basic and applied science underlying our understanding of cancer and the development and application of cancer treatments.

If scientists are the rocket in the cancer moonshot, HPC processing power is the jet fuel.

About the Author

Aaron Dubrow joined TACC in October 2007 as the Science and Technology Writer with the responsibility of reporting on the myriad of research and development projects undertaken by TACC.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are many interesting stories, and only a few ever become headli Read more…

Quantum Tech Sector Hiring Stays Soft

June 13, 2024

New job announcements in the quantum tech sector declined again last month, according to an Quantum Economic Development Consortium (QED-C) report issued last week. “Globally, the number of new, public postings for Qu Read more…

Labs Keep Supercomputers Alive for Ten Years as Vendors Pull Support Early

June 12, 2024

Laboratories are running supercomputers for much longer, beyond the typical lifespan, as vendors prematurely deprecate the hardware and stop providing support. A typical supercomputer lifecycle is about five to six years Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently king of accelerated computing) wins again, sweeping all nine Read more…

Highlights from GlobusWorld 2024: The Conference for Reimagining Research IT

June 11, 2024

The Globus user conference, now in its 22nd year, brought together over 180 researchers, system administrators, developers, and IT leaders from 55 top research computing centers, national labs, federal agencies, and univ Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst firm TechInsights. Nvidia's GPU shipments in 2023 grew by more Read more…

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are Read more…

Labs Keep Supercomputers Alive for Ten Years as Vendors Pull Support Early

June 12, 2024

Laboratories are running supercomputers for much longer, beyond the typical lifespan, as vendors prematurely deprecate the hardware and stop providing support. Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently Read more…

Highlights from GlobusWorld 2024: The Conference for Reimagining Research IT

June 11, 2024

The Globus user conference, now in its 22nd year, brought together over 180 researchers, system administrators, developers, and IT leaders from 55 top research Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

ASC24 Expert Perspective: Dongarra, Hoefler, Yong Lin

June 7, 2024

One of the great things about being at an ASC (Asia Supercomputer Community) cluster competition is getting the chance to interview various industry experts and Read more…

HPC and Climate: Coastal Hurricanes Around the World Are Intensifying Faster

June 6, 2024

Hurricanes are among the world's most destructive natural hazards. Their environment shapes their ability to deliver damage; conditions like warm ocean waters, Read more…

ASC24: The Battle, The Apps, and The Competitors

June 5, 2024

The ASC24 (Asia Supercomputer Community) Student Cluster Competition was one for the ages. More than 350 university teams worked for months in the preliminary competition to earn one of the 25 final competition slots. The winning teams... Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Leading Solution Providers

Contributors

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire