TY - JOUR
T1 - Repurposing the Clinical Record
T2 - Can an Existing Natural Language Processing System De-identify Clinical Notes?
AU - Morrison, Frances P.
AU - Li, Li
AU - Lai, Albert M.
AU - Hripcsak, George
N1 - Funding Information:
Research for natural language processing and continuing development of MedLEE supported by R01 LM007659 and R01 LM008635 from the National Library of Medicine.
PY - 2009/1
Y1 - 2009/1
N2 - Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
AB - Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
UR - http://www.scopus.com/inward/record.url?scp=57149148103&partnerID=8YFLogxK
U2 - 10.1197/jamia.M2862
DO - 10.1197/jamia.M2862
M3 - Article
C2 - 18952938
AN - SCOPUS:57149148103
SN - 1067-5027
VL - 16
SP - 37
EP - 39
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 1
ER -