Taming big data: An information extraction strategy for large clinical text corpora

  • Adi V. Gundlapalli
  • , Guy Divita
  • , Marjorie E. Carter
  • , Andrew Redd
  • , Matthew H. Samore
  • , Kalpana Gupta
  • , Barbara Trautner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Concepts of interest for clinical and research purposes are not uniformly distributed in clinical text available in electronic medical records. The purpose of our study was to identify filtering techniques to select 'high yield' documents for increased efficacy and throughput. Using two large corpora of clinical text, we demonstrate the identification of 'high yield' document sets in two unrelated domains: homelessness and indwelling urinary catheters. For homelessness, the high yield set includes homeless program and social work notes. For urinary catheters, concepts were more prevalent in notes from hospitalized patients; nursing notes accounted for a majority of the high yield set. This filtering will enable customization and refining of information extraction pipelines to facilitate extraction of relevant concepts for clinical decision support and other uses.

Original languageEnglish
Title of host publicationEnabling Health Informatics Applications
EditorsJohn Mantas, Mowafa S. Househ, Arie Hasman
PublisherIOS Press
Pages175-178
Number of pages4
ISBN (Electronic)9781614995371
DOIs
StatePublished - 2015
Event13th International Conference on Informatics, Management, and Technology in Healthcare, ICIMTH 2015 - Athens, Greece
Duration: Jul 9 2015Jul 11 2015

Publication series

NameStudies in Health Technology and Informatics
Volume213
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Conference

Conference13th International Conference on Informatics, Management, and Technology in Healthcare, ICIMTH 2015
Country/TerritoryGreece
CityAthens
Period07/9/1507/11/15

Keywords

  • Big data
  • Information extraction
  • Natural language processing

Fingerprint

Dive into the research topics of 'Taming big data: An information extraction strategy for large clinical text corpora'. Together they form a unique fingerprint.

Cite this