TY - JOUR
T1 - Bias Correction in RNA-Seq Short-Read Counts Using Penalized Regression
AU - Dalpiaz, David
AU - He, Xuming
AU - Ma, Ping
PY - 2013/5
Y1 - 2013/5
N2 - RNA-Seq produces tens of millions of short reads. When mapped to the genome and/or to the reference transcripts, RNA-Seq data can be summarized by a very large number of short-read counts. Accurate transcript quantification, such as gene expression calculation, relies on proper correction of sequence bias in the RNA-Seq short-read counts. We use a linear model for the sequence bias, which is much more flexible than the popular Poisson model. We fit the model using a penalized regression method, which allows for a significant dimension reduction. The algorithm is scalable for modeling RNA-Seq data. We demonstrate the excellent performance of our proposed method by applying it to real examples. The methods are implemented in open-source code, which is available in the R package lmbc.
AB - RNA-Seq produces tens of millions of short reads. When mapped to the genome and/or to the reference transcripts, RNA-Seq data can be summarized by a very large number of short-read counts. Accurate transcript quantification, such as gene expression calculation, relies on proper correction of sequence bias in the RNA-Seq short-read counts. We use a linear model for the sequence bias, which is much more flexible than the popular Poisson model. We fit the model using a penalized regression method, which allows for a significant dimension reduction. The algorithm is scalable for modeling RNA-Seq data. We demonstrate the excellent performance of our proposed method by applying it to real examples. The methods are implemented in open-source code, which is available in the R package lmbc.
KW - Gene expression
KW - LASSO
KW - Next-generation sequencing
KW - Penalized likelihood
KW - Regularization
KW - RNA-Seq
UR - https://www.scopus.com/pages/publications/84877074589
U2 - 10.1007/s12561-012-9057-6
DO - 10.1007/s12561-012-9057-6
M3 - Article
AN - SCOPUS:84877074589
SN - 1867-1764
VL - 5
SP - 88
EP - 99
JO - Statistics in Biosciences
JF - Statistics in Biosciences
IS - 1
ER -