TY - GEN
T1 - A comparison of combined P-value methods for gene differential expression using RNA-seq data
AU - Eteleeb, Abdallah M.
AU - Moseley, Hunter N.
AU - Rouchka, Eric C.
N1 - Publisher Copyright:
Copyright © 2014 ACM.
PY - 2014/9/20
Y1 - 2014/9/20
N2 - Detection of differentially expressed (DE) genes across conditions within RNA-Seq datasets yields insight into the differences in biological processes at work in these conditions. Most methods available for discovering DE genes use statistical methods that model the data based on counting reads that map to individual genes. However, the distribution of reads across different regions of a gene can be heterogeneous. Summarizing reads at the gene level may provide inaccurate results. If genes are broken down into smaller regions, such as exons or even smaller fragments, and DE analysis is per- formed on those regions, the significance of the overall region can be determined using combined p-values which may improve the accuracy of detecting DE genes. We therefore conducted analysis to consider the performance of widely- used methods for combining p-values using publicly avail- Able RNA-Seq data. The combined p-value methods include: Fisher's, Z-transform, Weighted Z-test, Minimum P-value, Logit, and Weighted-sum methods. On liver and kidney data, the Weighted Z-test performs the best, detecting the highest number of truly DE genes. The effect of weights assigned in the Weighted Z-test enables this approach to outperform Fisher's method. On the MAQC datasets, our analysis indicates these methods perform similarly with a slight edge to the Weighted Z-test and Fisher's method in detecting true DE genes. However, the Weighted-sum clearly performs best in detecting true non-DE genes. Furthermore, these methods appear to have an inverse relationship in their performance in detecting DE genes versus non-DE genes in the MAQC datasets. These results indicate issues in properly combining high and low p-values, which may be due to a lack of independence between tests. Thus, a modified Fisher's method may provide more accurate results in these circumstances.
AB - Detection of differentially expressed (DE) genes across conditions within RNA-Seq datasets yields insight into the differences in biological processes at work in these conditions. Most methods available for discovering DE genes use statistical methods that model the data based on counting reads that map to individual genes. However, the distribution of reads across different regions of a gene can be heterogeneous. Summarizing reads at the gene level may provide inaccurate results. If genes are broken down into smaller regions, such as exons or even smaller fragments, and DE analysis is per- formed on those regions, the significance of the overall region can be determined using combined p-values which may improve the accuracy of detecting DE genes. We therefore conducted analysis to consider the performance of widely- used methods for combining p-values using publicly avail- Able RNA-Seq data. The combined p-value methods include: Fisher's, Z-transform, Weighted Z-test, Minimum P-value, Logit, and Weighted-sum methods. On liver and kidney data, the Weighted Z-test performs the best, detecting the highest number of truly DE genes. The effect of weights assigned in the Weighted Z-test enables this approach to outperform Fisher's method. On the MAQC datasets, our analysis indicates these methods perform similarly with a slight edge to the Weighted Z-test and Fisher's method in detecting true DE genes. However, the Weighted-sum clearly performs best in detecting true non-DE genes. Furthermore, these methods appear to have an inverse relationship in their performance in detecting DE genes versus non-DE genes in the MAQC datasets. These results indicate issues in properly combining high and low p-values, which may be due to a lack of independence between tests. Thus, a modified Fisher's method may provide more accurate results in these circumstances.
KW - Combining p-values
KW - Differential expression
KW - Rna-seq
UR - http://www.scopus.com/inward/record.url?scp=84920742698&partnerID=8YFLogxK
U2 - 10.1145/2649387.2649421
DO - 10.1145/2649387.2649421
M3 - Conference contribution
AN - SCOPUS:84920742698
T3 - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 417
EP - 425
BT - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery
T2 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Y2 - 20 September 2014 through 23 September 2014
ER -