Abstract
De novo gene predictors are programs that predict the exon-intron structures of genes using the sequences of one or more genomes as their only input. In the past two years, dual-genome de novo predictors, which exploit local rates and patterns of mutation inferred from alignments between two genomes, have led to significant improvements in accuracy. Systems that exploit more than two genomes simultaneously have only recently begun to appear and are not yet competitive on practical tasks, but offer the greatest hope for near-term improvements. Dual-genome de novo prediction for compact eukaryotic genomes such as those of Arabidopsis thaliana and Caenorhabditis elegans is already quite accurate. Although mammalian gene prediction lags behind in accuracy, it is yielding ever more useful results. Coupled with significant improvements in pseudogene detection methods, which have eliminated many false positives, we have reached the point where de novo gene predictions are being used as hypotheses to drive experimental annotation via systematic RT-PCR and sequencing.
Original language | English |
---|---|
Pages (from-to) | 264-272 |
Number of pages | 9 |
Journal | Current Opinion in Structural Biology |
Volume | 14 |
Issue number | 3 |
DOIs | |
State | Published - Jun 2004 |
Keywords
- EHMM
- EST
- Evolutionary HMM
- Expressed sequence tag
- HMM
- Hidden Markov model
- Indels
- Insertions and deletions
- ORF
- Open reading frame
- PPT
- Poly-pyrimidine tract
- RT-PCR
- Reverse transcription-polymerase chain reaction
- TSS
- Transcription start sites
- UTR