TY - JOUR
T1 - Simplified data science approach to extract social and behavioural determinants
T2 - A retrospective chart review
AU - Teng, Andrew
AU - Wilcox, Adam
N1 - Publisher Copyright:
©
PY - 2022/1/18
Y1 - 2022/1/18
N2 - Objectives We aim to extract a subset of social factors from clinical notes using common text classification methods. Design Retrospective chart review. Setting We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. Participants Notes were retrospectively extracted from 43 798 acute care patients. Methods We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. Results From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes.
AB - Objectives We aim to extract a subset of social factors from clinical notes using common text classification methods. Design Retrospective chart review. Setting We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. Participants Notes were retrospectively extracted from 43 798 acute care patients. Methods We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. Results From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes.
KW - biotechnology & bioinformatics
KW - health informatics
KW - history (see medical history)
KW - social medicine
UR - http://www.scopus.com/inward/record.url?scp=85123610001&partnerID=8YFLogxK
U2 - 10.1136/bmjopen-2020-048397
DO - 10.1136/bmjopen-2020-048397
M3 - Article
C2 - 35042703
AN - SCOPUS:85123610001
SN - 2044-6055
VL - 12
JO - BMJ Open
JF - BMJ Open
IS - 1
M1 - e048397
ER -