A comparison of combined P-value methods for gene differential expression using RNA-seq data

Abdallah M. Eteleeb, Hunter N. Moseley, Eric C. Rouchka

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Detection of differentially expressed (DE) genes across conditions within RNA-Seq datasets yields insight into the differences in biological processes at work in these conditions. Most methods available for discovering DE genes use statistical methods that model the data based on counting reads that map to individual genes. However, the distribution of reads across different regions of a gene can be heterogeneous. Summarizing reads at the gene level may provide inaccurate results. If genes are broken down into smaller regions, such as exons or even smaller fragments, and DE analysis is per- formed on those regions, the significance of the overall region can be determined using combined p-values which may improve the accuracy of detecting DE genes. We therefore conducted analysis to consider the performance of widely- used methods for combining p-values using publicly avail- Able RNA-Seq data. The combined p-value methods include: Fisher's, Z-transform, Weighted Z-test, Minimum P-value, Logit, and Weighted-sum methods. On liver and kidney data, the Weighted Z-test performs the best, detecting the highest number of truly DE genes. The effect of weights assigned in the Weighted Z-test enables this approach to outperform Fisher's method. On the MAQC datasets, our analysis indicates these methods perform similarly with a slight edge to the Weighted Z-test and Fisher's method in detecting true DE genes. However, the Weighted-sum clearly performs best in detecting true non-DE genes. Furthermore, these methods appear to have an inverse relationship in their performance in detecting DE genes versus non-DE genes in the MAQC datasets. These results indicate issues in properly combining high and low p-values, which may be due to a lack of independence between tests. Thus, a modified Fisher's method may provide more accurate results in these circumstances.

Original languageEnglish
Title of host publicationACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery
Pages417-425
Number of pages9
ISBN (Electronic)9781450328944
DOIs
StatePublished - Sep 20 2014
Event5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014 - Newport Beach, United States
Duration: Sep 20 2014Sep 23 2014

Publication series

NameACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
Country/TerritoryUnited States
CityNewport Beach
Period09/20/1409/23/14

Keywords

  • Combining p-values
  • Differential expression
  • Rna-seq

Fingerprint

Dive into the research topics of 'A comparison of combined P-value methods for gene differential expression using RNA-seq data'. Together they form a unique fingerprint.

Cite this