De novo gene predictors are programs that predict the exon-intron structures of genes using the sequences of one or more genomes as their only input. In the past two years, dual-genome de novo predictors, which exploit local rates and patterns of mutation inferred from alignments between two genomes, have led to significant improvements in accuracy. Systems that exploit more than two genomes simultaneously have only recently begun to appear and are not yet competitive on practical tasks, but offer the greatest hope for near-term improvements. Dual-genome de novo prediction for compact eukaryotic genomes such as those of Arabidopsis thaliana and Caenorhabditis elegans is already quite accurate. Although mammalian gene prediction lags behind in accuracy, it is yielding ever more useful results. Coupled with significant improvements in pseudogene detection methods, which have eliminated many false positives, we have reached the point where de novo gene predictions are being used as hypotheses to drive experimental annotation via systematic RT-PCR and sequencing.

Original languageEnglish
Pages (from-to)264-272
Number of pages9
JournalCurrent Opinion in Structural Biology
Issue number3
StatePublished - Jun 2004


  • EHMM
  • EST
  • Evolutionary HMM
  • Expressed sequence tag
  • HMM
  • Hidden Markov model
  • Indels
  • Insertions and deletions
  • ORF
  • Open reading frame
  • PPT
  • Poly-pyrimidine tract
  • RT-PCR
  • Reverse transcription-polymerase chain reaction
  • TSS
  • Transcription start sites
  • UTR


Dive into the research topics of 'Recent advances in gene structure prediction'. Together they form a unique fingerprint.

Cite this