Abstract

The multiple species de novo gene prediction problem can be stated as follows: given an alignment of genomic sequences from two or more organisms, predict the location and structure of all protein-coding genes in one or more of the sequences. Here, we present a new system, N-SCAN (a.k.a. TWINSCAN 3.0), for addressing this problem. N-SCAN can model the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, and insertions and deletions. An implementation of N-SCAN was created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster. Analyses of the predictions reveal that N-SCAN's accuracy in both human and fly exceeds that of all previously published whole-genome de novo gene predictors.

Original languageEnglish
Pages (from-to)379-393
Number of pages15
JournalJournal of Computational Biology
Volume13
Issue number2
DOIs
StatePublished - Mar 2006

Keywords

  • Comparative genomics
  • Gene prediction
  • Genome annotation
  • Phylogenetic models

Fingerprint

Dive into the research topics of 'Using multiple alignments to improve gene prediction'. Together they form a unique fingerprint.

Cite this