TY - GEN
T1 - Inductive identification of functional status information and establishing a gold standard corpus
T2 - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
AU - Thieu, Thanh
AU - Camacho, Jonathan
AU - Ho, Pei Shu
AU - Porcino, Julia
AU - Ding, Min
AU - Nelson, Lisa
AU - Rasch, Elizabeth
AU - Zhou, Chunxiao
AU - Chan, Leighton
AU - Brandt, Diane
AU - Newman-Griffis, Denis
AU - Yuan, Ao
AU - Lai, Albert M.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/15
Y1 - 2017/12/15
N2 - The importance of functional status information (FSI) has become increasingly evident in recent years [1, 2]. However, implementation, application, and normalization of FSI in health care and Electronic Health Records (EHRs) have been largely underexplored. The World Health Organization's International Classification of Functioning, Disability and Health (ICF) [3] is considered to be the international standard for describing and coding function and health states. Nevertheless, the ICF provides only a limited vocabulary for recognizing FSI descriptions, since its purpose is to organize concepts related to functioning rather than to provide a comprehensive terminology or a complete set of relations between concepts. While the free text portion of EHRs might provide a more complete picture of health status, treatment, and progress, current Natural Language Processing (NLP) methods largely focus on extracting medical conditions (e.g. diagnoses and symptoms, etc.). The absence of a standardized functional terminology and incompleteness of the ICF as a vocabulary source makes it challenging to build a NLP system to extract FSI from EHR free text. Our work takes the first step towards extraction of FSI from free text by systematically identifying the structure of FSI related to Mobility, a key domain of the ICF and an important domain in the determination of work disability. Our interdisciplinary research group inductively evaluated examples extracted from over 1,200 Physical Therapy (PT) notes from the Clinical Center of the National Institutes of Health (NIH). This extensive work resulted in a nested entity structure comprised of 2 entities, 3 sub-entities, 8 attributes, and 21 attribute values. Furthermore, we have manually curated the first gold standard corpus of 200 double-annotated and 50 triple-annotated PT notes. Our inter-annotator agreement (IAA) averages 97% F1-score on partial textual span matching and from 0.4 to 0.9 Siegel & Castellan's kappa on attribute value matching. Such a rich semantic corpus of Mobility FSI is valuable and a promising resource for future statistical learning. Our method is also adaptable to other domains of the ICF.
AB - The importance of functional status information (FSI) has become increasingly evident in recent years [1, 2]. However, implementation, application, and normalization of FSI in health care and Electronic Health Records (EHRs) have been largely underexplored. The World Health Organization's International Classification of Functioning, Disability and Health (ICF) [3] is considered to be the international standard for describing and coding function and health states. Nevertheless, the ICF provides only a limited vocabulary for recognizing FSI descriptions, since its purpose is to organize concepts related to functioning rather than to provide a comprehensive terminology or a complete set of relations between concepts. While the free text portion of EHRs might provide a more complete picture of health status, treatment, and progress, current Natural Language Processing (NLP) methods largely focus on extracting medical conditions (e.g. diagnoses and symptoms, etc.). The absence of a standardized functional terminology and incompleteness of the ICF as a vocabulary source makes it challenging to build a NLP system to extract FSI from EHR free text. Our work takes the first step towards extraction of FSI from free text by systematically identifying the structure of FSI related to Mobility, a key domain of the ICF and an important domain in the determination of work disability. Our interdisciplinary research group inductively evaluated examples extracted from over 1,200 Physical Therapy (PT) notes from the Clinical Center of the National Institutes of Health (NIH). This extensive work resulted in a nested entity structure comprised of 2 entities, 3 sub-entities, 8 attributes, and 21 attribute values. Furthermore, we have manually curated the first gold standard corpus of 200 double-annotated and 50 triple-annotated PT notes. Our inter-annotator agreement (IAA) averages 97% F1-score on partial textual span matching and from 0.4 to 0.9 Siegel & Castellan's kappa on attribute value matching. Such a rich semantic corpus of Mobility FSI is valuable and a promising resource for future statistical learning. Our method is also adaptable to other domains of the ICF.
KW - ICF
KW - annotation
KW - functional status information
KW - functioning
KW - manual curation
KW - natural language processing
KW - physical therapy
UR - http://www.scopus.com/inward/record.url?scp=85046273449&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2017.8218042
DO - 10.1109/BIBM.2017.8218042
M3 - Conference contribution
AN - SCOPUS:85046273449
T3 - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
SP - 2319
EP - 2321
BT - Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
A2 - Yoo, Illhoi
A2 - Zheng, Jane Huiru
A2 - Gong, Yang
A2 - Hu, Xiaohua Tony
A2 - Shyu, Chi-Ren
A2 - Bromberg, Yana
A2 - Gao, Jean
A2 - Korkin, Dmitry
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 November 2017 through 16 November 2017
ER -