TY - JOUR
T1 - Comparison of UMLS terminologies to identify risk of heart disease using clinical notes
AU - Shivade, Chaitanya
AU - Malewadkar, Pranav
AU - Fosler-Lussier, Eric
AU - Lai, Albert M.
N1 - Funding Information:
Research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under Award Number R01LM011116 . The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2015/12/1
Y1 - 2015/12/1
N2 - The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1 = 90.7) that is significantly higher than the median (F1 = 87.20) and close to the top performing system (F1 = 92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.
AB - The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1 = 90.7) that is significantly higher than the median (F1 = 87.20) and close to the top performing system (F1 = 92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.
KW - Electronic health records
KW - Natural language processing
KW - Rule-based system
KW - Unified Medical Language System
UR - http://www.scopus.com/inward/record.url?scp=84945319813&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2015.08.025
DO - 10.1016/j.jbi.2015.08.025
M3 - Article
C2 - 26375493
AN - SCOPUS:84945319813
SN - 1532-0464
VL - 58
SP - S103-S110
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -