TY - JOUR
T1 - Group spike-And-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information
AU - Tang, Zaixiang
AU - Shen, Yueping
AU - Li, Yan
AU - Zhang, Xinyan
AU - Wen, Jia
AU - Qian, Chen'Ao
AU - Zhuang, Wenzhuo
AU - Shi, Xinghua
AU - Yi, Nengjun
N1 - Publisher Copyright:
© The Author 2017. Published by Oxford University Press. All rights reserved.
PY - 2018/3/15
Y1 - 2018/3/15
N2 - Motivation Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results We propose new Bayesian hierarchical generalized linear models, called group spike-And-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-Adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact [email protected] Supplementary informationSupplementary dataare available at Bioinformatics online.
AB - Motivation Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results We propose new Bayesian hierarchical generalized linear models, called group spike-And-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-Adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact [email protected] Supplementary informationSupplementary dataare available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=85044290267&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx684
DO - 10.1093/bioinformatics/btx684
M3 - Article
C2 - 29077795
AN - SCOPUS:85044290267
SN - 1367-4803
VL - 34
SP - 901
EP - 910
JO - Bioinformatics
JF - Bioinformatics
IS - 6
ER -