TY - JOUR
T1 - Improving eukaryotic genome annotation using single molecule mRNA sequencing
AU - Magrini, Vincent
AU - Gao, Xin
AU - Rosa, Bruce A.
AU - McGrath, Sean
AU - Zhang, Xu
AU - Hallsworth-Pepin, Kymberlie
AU - Martin, John
AU - Hawdon, John
AU - Wilson, Richard K.
AU - Mitreva, Makedonka
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/3/1
Y1 - 2018/3/1
N2 - Background: The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq. Results: We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features. Conclusion: Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.
AB - Background: The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq. Results: We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an increase in the number and proportion of functionally annotated genes. RNA-seq data also confirmed some of the newly annotated genes and gene features. Conclusion: Overall, PacBio data has supported a significant improvement in gene annotation in this genome, and is an appealing alternative or complementary technique for genome annotation to the other transcript sequencing technologies.
KW - Ancylostoma ceylanicum
KW - Gene loci
KW - Genome annotation improvement
KW - Hookworm
KW - Pacific bioscience mRNA sequencing
UR - http://www.scopus.com/inward/record.url?scp=85042788278&partnerID=8YFLogxK
U2 - 10.1186/s12864-018-4555-7
DO - 10.1186/s12864-018-4555-7
M3 - Article
C2 - 29495964
AN - SCOPUS:85042788278
SN - 1471-2164
VL - 19
JO - BMC genomics
JF - BMC genomics
IS - 1
M1 - 172
ER -