Inductive identification of functional status information and establishing a gold standard corpus: A case study on the Mobility domain

Thanh Thieu, Jonathan Camacho, Pei Shu Ho, Julia Porcino, Min Ding, Lisa Nelson, Elizabeth Rasch, Chunxiao Zhou, Leighton Chan, Diane Brandt, Denis Newman-Griffis, Ao Yuan, Albert M. Lai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

The importance of functional status information (FSI) has become increasingly evident in recent years [1, 2]. However, implementation, application, and normalization of FSI in health care and Electronic Health Records (EHRs) have been largely underexplored. The World Health Organization's International Classification of Functioning, Disability and Health (ICF) [3] is considered to be the international standard for describing and coding function and health states. Nevertheless, the ICF provides only a limited vocabulary for recognizing FSI descriptions, since its purpose is to organize concepts related to functioning rather than to provide a comprehensive terminology or a complete set of relations between concepts. While the free text portion of EHRs might provide a more complete picture of health status, treatment, and progress, current Natural Language Processing (NLP) methods largely focus on extracting medical conditions (e.g. diagnoses and symptoms, etc.). The absence of a standardized functional terminology and incompleteness of the ICF as a vocabulary source makes it challenging to build a NLP system to extract FSI from EHR free text. Our work takes the first step towards extraction of FSI from free text by systematically identifying the structure of FSI related to Mobility, a key domain of the ICF and an important domain in the determination of work disability. Our interdisciplinary research group inductively evaluated examples extracted from over 1,200 Physical Therapy (PT) notes from the Clinical Center of the National Institutes of Health (NIH). This extensive work resulted in a nested entity structure comprised of 2 entities, 3 sub-entities, 8 attributes, and 21 attribute values. Furthermore, we have manually curated the first gold standard corpus of 200 double-annotated and 50 triple-annotated PT notes. Our inter-annotator agreement (IAA) averages 97% F1-score on partial textual span matching and from 0.4 to 0.9 Siegel & Castellan's kappa on attribute value matching. Such a rich semantic corpus of Mobility FSI is valuable and a promising resource for future statistical learning. Our method is also adaptable to other domains of the ICF.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2319-2321
Number of pages3
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Volume2017-January

Conference

Conference2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Country/TerritoryUnited States
CityKansas City
Period11/13/1711/16/17

Keywords

  • ICF
  • annotation
  • functional status information
  • functioning
  • manual curation
  • natural language processing
  • physical therapy

Fingerprint

Dive into the research topics of 'Inductive identification of functional status information and establishing a gold standard corpus: A case study on the Mobility domain'. Together they form a unique fingerprint.

Cite this