TY - JOUR
T1 - A review of approaches to identifying patient phenotype cohorts using electronic health records
AU - Shivade, Chaitanya
AU - Raghavan, Preethi
AU - Fosler-Lussier, Eric
AU - Embi, Peter J.
AU - Elhadad, Noemie
AU - Johnson, Stephen B.
AU - Lai, Albert M.
PY - 2014
Y1 - 2014
N2 - Objective: To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods: We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results: Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion: We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions: There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
AB - Objective: To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods: We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results: Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion: We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions: There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
UR - http://www.scopus.com/inward/record.url?scp=84894026081&partnerID=8YFLogxK
U2 - 10.1136/amiajnl-2013-001935
DO - 10.1136/amiajnl-2013-001935
M3 - Article
C2 - 24201027
AN - SCOPUS:84894026081
SN - 1067-5027
VL - 21
SP - 221
EP - 230
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 2
ER -