TY - JOUR
T1 - Explaining Gene Expression Using Twenty-One MicroRNAs
AU - Asiaee, Amir
AU - Abrams, Zachary B.
AU - Nakayiza, Samantha
AU - Sampath, Deepa
AU - Coombes, Kevin R.
N1 - Funding Information:
The authors would like to thank the Mathematical Biosciences Institute (MBI) at Ohio State University for partially supporting this research. MBI receives its funding through the National Science Foundation grant DMS 1440386
Publisher Copyright:
© Copyright 2020, Mary Ann Liebert, Inc., publishers 2020.
PY - 2020/7
Y1 - 2020/7
N2 - The transcriptome of a tumor contains detailed information about the disease. Although advances in sequencing technologies have generated larger data sets, there are still many questions about exactly how the transcriptome is regulated. One class of regulatory elements consists of microRNAs (or miRs), many of which are known to be associated with cancer. To better understand the relationships between miRs and cancers, we analyzed ∼9000 samples from 32 cancer types studied in The Cancer Genome Atlas. Our feature reduction algorithm found evidence for 21 biologically interpretable clusters of miRs, many of which were statistically associated with a specific type of cancer. Moreover, the clusters contain sufficient information to distinguish between most types of cancer. We then used linear models to measure, genome-wide, how much variation in gene expression could be explained by the 21 average expression values ("scores") of the clusters. Based on the ∼20,000 per-gene R2 values, we found that (1) mean differences between tissues of origin explain about 36% of variation; (2) the 21 miR cluster scores explain about 30% of the variation; and (3) combining tissue type with the miR scores explained about 56% of the total genome-wide variation in gene expression. Our analysis of poorly explained genes shows that they are enriched for olfactory receptor processes, sensory perception, and nervous system processing, which are necessary to receive and interpret signals from outside the organism. Therefore, it is reasonable for those genes to be always active and not get downregulated by miRs. In contrast, highly explained genes are characterized by genes translating to proteins necessary for transport, plasma membrane, or metabolic processes that are heavily regulated processes inside the cell. Other genetic regulatory elements such as transcription factors and methylation might help explain some of the remaining variation in gene expression.
AB - The transcriptome of a tumor contains detailed information about the disease. Although advances in sequencing technologies have generated larger data sets, there are still many questions about exactly how the transcriptome is regulated. One class of regulatory elements consists of microRNAs (or miRs), many of which are known to be associated with cancer. To better understand the relationships between miRs and cancers, we analyzed ∼9000 samples from 32 cancer types studied in The Cancer Genome Atlas. Our feature reduction algorithm found evidence for 21 biologically interpretable clusters of miRs, many of which were statistically associated with a specific type of cancer. Moreover, the clusters contain sufficient information to distinguish between most types of cancer. We then used linear models to measure, genome-wide, how much variation in gene expression could be explained by the 21 average expression values ("scores") of the clusters. Based on the ∼20,000 per-gene R2 values, we found that (1) mean differences between tissues of origin explain about 36% of variation; (2) the 21 miR cluster scores explain about 30% of the variation; and (3) combining tissue type with the miR scores explained about 56% of the total genome-wide variation in gene expression. Our analysis of poorly explained genes shows that they are enriched for olfactory receptor processes, sensory perception, and nervous system processing, which are necessary to receive and interpret signals from outside the organism. Therefore, it is reasonable for those genes to be always active and not get downregulated by miRs. In contrast, highly explained genes are characterized by genes translating to proteins necessary for transport, plasma membrane, or metabolic processes that are heavily regulated processes inside the cell. Other genetic regulatory elements such as transcription factors and methylation might help explain some of the remaining variation in gene expression.
KW - feature extraction
KW - gene expression prediction
KW - gene regulation
KW - mRNA
KW - microRNA
UR - http://www.scopus.com/inward/record.url?scp=85088487517&partnerID=8YFLogxK
U2 - 10.1089/cmb.2019.0321
DO - 10.1089/cmb.2019.0321
M3 - Article
C2 - 31794247
AN - SCOPUS:85088487517
SN - 1066-5277
VL - 27
SP - 1157
EP - 1170
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 7
ER -