TY - JOUR
T1 - Computer vision applied to herbarium specimens of German trees
T2 - Testing the future utility of the millions of herbarium specimen images for automated identification
AU - Unger, Jakob
AU - Merhof, Dorit
AU - Renner, Susanne
N1 - Publisher Copyright:
© 2016 The Author(s).
PY - 2016/11/16
Y1 - 2016/11/16
N2 - Background: Global Plants, a collaborative between JSTOR and some 300 herbaria, now contains about 2.48 million high-resolution images of plant specimens, a number that continues to grow, and collections that are digitizing their specimens at high resolution are allocating considerable recourses to the maintenance of computer hardware (e.g., servers) and to acquiring digital storage space. We here apply machine learning, specifically the training of a Support-Vector-Machine, to classify specimen images into categories, ideally at the species level, using the 26 most common tree species in Germany as a test case. Results: We designed an analysis pipeline and classification system consisting of segmentation, normalization, feature extraction, and classification steps and evaluated the system in two test sets, one with 26 species, the other with 17, in each case using 10 images per species of plants collected between 1820 and 1995, which simulates the empirical situation that most named species are represented in herbaria and databases, such as JSTOR, by few specimens. We achieved 73.21% accuracy of species assignments in the larger test set, and 84.88% in the smaller test set. Conclusions: The results of this first application of a computer vision algorithm trained on images of herbarium specimens shows that despite the problem of overlapping leaves, leaf-architectural features can be used to categorize specimens to species with good accuracy. Computer vision is poised to play a significant role in future rapid identification at least for frequently collected genera or species in the European flora.
AB - Background: Global Plants, a collaborative between JSTOR and some 300 herbaria, now contains about 2.48 million high-resolution images of plant specimens, a number that continues to grow, and collections that are digitizing their specimens at high resolution are allocating considerable recourses to the maintenance of computer hardware (e.g., servers) and to acquiring digital storage space. We here apply machine learning, specifically the training of a Support-Vector-Machine, to classify specimen images into categories, ideally at the species level, using the 26 most common tree species in Germany as a test case. Results: We designed an analysis pipeline and classification system consisting of segmentation, normalization, feature extraction, and classification steps and evaluated the system in two test sets, one with 26 species, the other with 17, in each case using 10 images per species of plants collected between 1820 and 1995, which simulates the empirical situation that most named species are represented in herbaria and databases, such as JSTOR, by few specimens. We achieved 73.21% accuracy of species assignments in the larger test set, and 84.88% in the smaller test set. Conclusions: The results of this first application of a computer vision algorithm trained on images of herbarium specimens shows that despite the problem of overlapping leaves, leaf-architectural features can be used to categorize specimens to species with good accuracy. Computer vision is poised to play a significant role in future rapid identification at least for frequently collected genera or species in the European flora.
KW - Automated identification
KW - Computer vision
KW - Herbarium specimens
KW - JSTOR
KW - Leaf shape
KW - Leaf venation
UR - https://www.scopus.com/pages/publications/84995469447
U2 - 10.1186/s12862-016-0827-5
DO - 10.1186/s12862-016-0827-5
M3 - Article
C2 - 27852219
AN - SCOPUS:84995469447
SN - 1471-2148
VL - 16
JO - BMC Evolutionary Biology
JF - BMC Evolutionary Biology
IS - 1
M1 - 248
ER -