Abstract
The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN has the ability to model dependencies between the aligned sequences, context-dependent substitution rates, and insertions and deletions in the sequences. An implementation of N-SCAN was created and used to generate predictions for the entire human genome. An analysis of the predictions reveals that N-SCAN's predictive accuracy in human exceeds that of all previously published whole-genome de novo gene predictors. In addition, predictions were generated for the genome of the fruit fly Drosophila melanogaster to demonstrate the applicability of N-SCAN to invertebrate gene prediction.
Original language | English |
---|---|
Pages (from-to) | 374-388 |
Number of pages | 15 |
Journal | Lecture Notes in Bioinformatics (Subseries of Lecture Notes in Computer Science) |
Volume | 3500 |
DOIs | |
State | Published - 2005 |
Event | 9th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2005 - Cambridge, MA, United States Duration: May 14 2005 → May 18 2005 |