TY - JOUR
T1 - Using multiple alignments to improve gene prediction
AU - Gross, Samuel S.
AU - Brent, Michael R.
PY - 2006/3
Y1 - 2006/3
N2 - The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN can model the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, and insertions and deletions. An implementation of N-SCAN was created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster. Analyses of the predictions reveal that N-SCAN's accuracy in both human and fly exceeds that of all previously published whole-genome de novo gene predictors.
AB - The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN can model the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, and insertions and deletions. An implementation of N-SCAN was created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster. Analyses of the predictions reveal that N-SCAN's accuracy in both human and fly exceeds that of all previously published whole-genome de novo gene predictors.
KW - Comparative genomics
KW - Gene prediction
KW - Genome annotation
KW - Phylogenetic models
UR - http://www.scopus.com/inward/record.url?scp=33645979674&partnerID=8YFLogxK
U2 - 10.1089/cmb.2006.13.379
DO - 10.1089/cmb.2006.13.379
M3 - Article
C2 - 16597247
AN - SCOPUS:33645979674
SN - 1066-5277
VL - 13
SP - 379
EP - 393
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 2
ER -