TY - GEN
T1 - Insights into Analogy Completion from the Biomedical Domain
AU - Newman-Griffis, Denis
AU - Lai, Albert M.
AU - Fosler-Lussier, Eric
N1 - Publisher Copyright:
© 2017 Association for Computational Linguistics
PY - 2017
Y1 - 2017
N2 - Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains. Through an analysis of analogies in the biomedical domain, we identify three assumptions: that of a Single Answer for any given analogy, that the pairs involved describe the Same Relationship, and that each pair is Informative with respect to the other. We propose modifying the standard methodology to relax these assumptions by allowing for multiple correct answers, reporting MAP and MRR in addition to accuracy, and using multiple example pairs. We further present BMASS, a novel dataset for evaluating linguistic regularities in biomedical embeddings, and demonstrate that the relationships described in the dataset pose significant semantic challenges to current word embedding methods.
AB - Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains. Through an analysis of analogies in the biomedical domain, we identify three assumptions: that of a Single Answer for any given analogy, that the pairs involved describe the Same Relationship, and that each pair is Informative with respect to the other. We propose modifying the standard methodology to relax these assumptions by allowing for multiple correct answers, reporting MAP and MRR in addition to accuracy, and using multiple example pairs. We further present BMASS, a novel dataset for evaluating linguistic regularities in biomedical embeddings, and demonstrate that the relationships described in the dataset pose significant semantic challenges to current word embedding methods.
UR - http://www.scopus.com/inward/record.url?scp=85080445452&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85080445452
T3 - BioNLP 2017 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 16th BioNLP Workshop
SP - 19
EP - 28
BT - BioNLP 2017 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 16th BioNLP Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 16th SIGBioMed Workshop on Biomedical Natural Language Processing, BioNLP 2017
Y2 - 4 August 2017
ER -