TY - JOUR
T1 - Data storage and data re-use in taxonomy—the need for improved storage and accessibility of heterogeneous data
AU - Gemeinholzer, Birgit
AU - Vences, Miguel
AU - Beszteri, Bank
AU - Bruy, Teddy
AU - Felden, Janine
AU - Kostadinov, Ivaylo
AU - Miralles, Aurélien
AU - Nattkemper, Tim W.
AU - Printzen, Christian
AU - Renz, Jasmin
AU - Rybalka, Nataliya
AU - Schuster, Tanja
AU - Weibulat, Tanja
AU - Wilke, Thomas
AU - Renner, Susanne S.
N1 - Publisher Copyright:
© 2020, The Author(s).
PY - 2020/3/1
Y1 - 2020/3/1
N2 - The ability to rapidly generate and share molecular, visual, and acoustic data, and to compare them with existing information, and thereby to detect and name biological entities is fundamentally changing our understanding of evolutionary relationships among organisms and is also impacting taxonomy. Harnessing taxonomic data for rapid, automated species identification by machine learning tools or DNA metabarcoding techniques has great potential but will require their review, accessible storage, comprehensive comparison, and integration with prior knowledge and information. Currently, data production, management, and sharing in taxonomic studies are not keeping pace with these needs. Indeed, a survey of recent taxonomic publications provides evidence that few species descriptions in zoology and botany incorporate DNA sequence data. The use of modern high-throughput (-omics) data is so far the exception in alpha-taxonomy, although they are easily stored in GenBank and similar databases. By contrast, for the more routinely used image data, the problem is that they are rarely made available in openly accessible repositories. Improved sharing and re-using of both types of data requires institutions that maintain long-term data storage and capacity with workable, user-friendly but highly automated pipelines. Top priority should be given to standardization and pipeline development for the easy submission and storage of machine-readable data (e.g., images, audio files, videos, tables of measurements). The taxonomic community in Germany and the German Federation for Biological Data are researching options for a higher level of automation, improved linking among data submission and storage platforms, and for making existing taxonomic information more readily accessible.
AB - The ability to rapidly generate and share molecular, visual, and acoustic data, and to compare them with existing information, and thereby to detect and name biological entities is fundamentally changing our understanding of evolutionary relationships among organisms and is also impacting taxonomy. Harnessing taxonomic data for rapid, automated species identification by machine learning tools or DNA metabarcoding techniques has great potential but will require their review, accessible storage, comprehensive comparison, and integration with prior knowledge and information. Currently, data production, management, and sharing in taxonomic studies are not keeping pace with these needs. Indeed, a survey of recent taxonomic publications provides evidence that few species descriptions in zoology and botany incorporate DNA sequence data. The use of modern high-throughput (-omics) data is so far the exception in alpha-taxonomy, although they are easily stored in GenBank and similar databases. By contrast, for the more routinely used image data, the problem is that they are rarely made available in openly accessible repositories. Improved sharing and re-using of both types of data requires institutions that maintain long-term data storage and capacity with workable, user-friendly but highly automated pipelines. Top priority should be given to standardization and pipeline development for the easy submission and storage of machine-readable data (e.g., images, audio files, videos, tables of measurements). The taxonomic community in Germany and the German Federation for Biological Data are researching options for a higher level of automation, improved linking among data submission and storage platforms, and for making existing taxonomic information more readily accessible.
KW - Accelerated species description
KW - Data repositories
KW - German Federation for Biological Data
KW - Image recognition
KW - Machine learning tools
KW - Metabarcoding
KW - Taxonomy
UR - http://www.scopus.com/inward/record.url?scp=85078251023&partnerID=8YFLogxK
U2 - 10.1007/s13127-019-00428-w
DO - 10.1007/s13127-019-00428-w
M3 - Article
AN - SCOPUS:85078251023
SN - 1439-6092
VL - 20
SP - 1
EP - 8
JO - Organisms Diversity and Evolution
JF - Organisms Diversity and Evolution
IS - 1
ER -