Background: Somatic mutations can be used as potential biomarkers for subtyping and predicting outcomes for cancer patients. However, cancer patients often carry many somatic mutations, which do not always concentrate on specific genomic loci, suggesting that the mutations may affect common pathways or gene interaction networks instead of common genes. The challenge is thus to identify the functional relationships among the mutations using multi-modal data. We developed a novel approach for integrating patient somatic mutation, transcriptome and clinical data to mine underlying functional gene groups that can be used to stratify cancer patients into groups with different clinical outcomes. Specifically, we use distance correlation metric to mine the correlations between expression profiles of mutated genes from different patients. Results: With this approach, we were able to cluster patients based on the functional relationships between the affected genes using their expression profiles, and to visualize the results using multi-dimensional scaling. Interestingly, we identified a stable subgroup of breast cancer patients that are highly enriched with ER-negative and triple-negative subtypes, and the somatic mutation genes they harbor were capable of acting as potential biomarkers to predict patient survival in several different breast cancer datasets, especially in ER-negative cohorts which has lacked reliable biomarkers. Conclusions: Our method provides a novel and promising approach for integrating genotyping and gene expression data in patient stratification in complex diseases.
- Breast cancer patient stratification
- Distance correlation
- Functional analysis of somatic mutation
- Integrative analysis