Barcelona Supercomputing Center Contributes to Pan-Cancer Project

Feb. 7, 2020 — An international team has completed the most comprehensive study of whole cancer genomes to date, significantly improving our fundamental understanding of cancer and signposting new directions for its diagnosis and treatment. The Barcelona Supercomputing Center (BSC) has been involved from the initial stages of this project, and has contributed with the analysis of data, with the design of specific computing solutions for cancer genomics, as well as in the answering of specific questions related to with the biology of tumors.

The ICGC/TCGA[1] Pan-Cancer Analysis of Whole Genomes Project (PCAWG), known as the Pan-Cancer Project, a collaboration involving more than 1,300 scientists and clinicians from 37 countries, analyzed more than 2,600 genomes of 38 different tumour types, creating a huge resource of primary cancer genomes. This was then the launch-point for 16 working groups studying multiple aspects of cancer’s development, causation, progression and classification.

Previous studies focused on the 1 per cent of the genome that codes for proteins, analogous to mapping the coasts of the continents. The Pan-Cancer Project explored in considerably greater detail the remaining 99 per cent of the genome, including key regions that control switching genes on and off — analogous to mapping the interiors of continents versus just their coastlines.

The Pan-Cancer Project has made available a comprehensive resource for cancer genomics research, including the raw genome sequencing data, software for cancer genome analysis, and multiple interactive websites exploring various aspects of the Pan-Cancer Project data.[2]

The role of High Perfomance Computing

Among the few data analysis centers around the world involved in the management and analysis of the PanCancer data, the BSC has been the most active European supercomputing center. Already involved in the early stages of this initiative, the BSC has had a wide contribution to the project, covering the primary analysis of genomes, including the detection of mutations; the generation of related computing resources; as well as the identification of non-functional gene copies that can have a direct impact on the offset and progression of tumors.

Two genome sequences from each donor (one from a healthy cell and one from a tumor cell) have been analyzed by different methods. With the BWA method, the two genomes of each donor were aligned with the reference human genome. Subsequently, the results of this alignment were analyzed with Sanger, DKFZ / EMBL, Broad / MUSE methods to compare the healthy and tumor genomes of the same patient and to detect possible mutations present in the tumor. In the BSC, 14% of these analyses were performed. One of the four methods of analysis (Broad / MUSE) has only been performed in the United States due to intellectual property issues in its algorithm.

“Beyond the specific discoveries regarding the biological processes behind the origin and progression of tumors, this effort has resulted in one of the largest international efforts in biomedicine, and has set up the path for future world-wide initiatives in relation to cancer genomics, and for Personalized Medicine in general” says Dr. David Torrents, ICREA research professor, leading the Computational Genomics groups at the BSC. “This project has placed our center among the top reference centers world-wide for data analysis in biomedicine and, in particular, for genomic oncology”.

New knowledge on cancer

The Pan-Cancer Project extended and advanced methods for analyzing cancer genomes which included cloud computing, and by applying these methods to its large dataset, discovered new knowledge about cancer biology and confirmed important findings of previous studies. In 23 papers published today in Nature and its affiliated journals, the Pan-Cancer Project reports that:

The cancer genome is finite and knowable, but enormously complicated. By combining sequencing of the whole cancer genome with a suite of analysis tools, we can characterize every genetic change found in a cancer, all the processes that have generated those mutations, and even the order of key events during a cancer’s life history.
Researchers are close to cataloguing all of the biological pathways involved in cancer and having a fuller picture of their actions in the genome. At least one causal mutation was found in virtually all of the cancers analyzed and the processes that generate mutations were found to be hugely diverse — from changes in single DNA letters to the reorganization of whole chromosomes. Multiple novel regions of the genome controlling how genes switch on and off were identified as targets of cancer-causing mutations.
Through a new method of “carbon dating,” Pan-Cancer researchers discovered that it is possible identify mutations which occurred years, sometimes even decades, before the tumour appears. This opens, theoretically, a window of opportunity for early cancer detection.
Tumour types can be identified accurately according to the patterns of genetic changes seen throughout the genome, potentially aiding the diagnosis of a patient’s cancer where conventional clinical tests could not identify its type. Knowledge of the exact tumour type could also help tailor treatments.

DOI: https://www.nature.com/articles/s41467-020-14367-0

1. ICGC – International Cancer Genome Consortium https://icgc.org/; TCGA – The Cancer Genome Atlas (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga)

2. PCAWG Portal (dcc.icgc.org/pcawg); UCSC Xena (pcawg.xenahubs.net); Expression Atlas (www.ebi.ac.uk/gxa/home); PCAWG-Scout (pcawgscout.bsc.es); Chromothripsis Explorer (compbio.med.harvard.edu/chromothripsis)

Source: Barcelona Supercomputing Center