TY - GEN
T1 - Transformer-based Multi-target Regression on Electronic Health Records for Primordial Prevention of Cardiovascular Disease
AU - Poulain, Raphael
AU - Gupta, Mehak
AU - Foraker, Randi
AU - Beheshti, Rahmatollah
N1 - Funding Information:
ACKNOWLEDGMENT The All of Us Research Program is supported by several grants from the National Institutes of Health. Our study was supported by NIH awards, P20GM103446 and P20GM113125, and RWJF award 76778.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Machine learning algorithms have been widely used to capture the static and temporal patterns within electronic health records (EHRs). While many studies focus on the (primary) prevention of diseases, primordial prevention (preventing the factors that are known to increase the risk of a disease occurring) is still widely under-investigated. In this study, we propose a multi-target regression model leveraging transformers to learn the bidirectional representations of EHR data and predict the future values of 11 major modifiable risk factors of cardiovascular disease (CVD). Inspired by the proven results of pre-training in natural language processing studies, we apply the same principles on EHR data, dividing the training of our model into two phases: pre-training and fine-tuning. We u se t he fine-tuned transformer model in a 'multi-target regression' theme. Following this theme, we combine the 11 disjoint prediction tasks by adding shared and target-specific l ayers t o t he m odel and jointly train the entire model. We evaluate the performance of our proposed method on a large publicly available EHR dataset. Through various experiments, we demonstrate that the proposed method obtains a significant improvement (12.6% MAE on average across all 11 different outputs) over the baselines.
AB - Machine learning algorithms have been widely used to capture the static and temporal patterns within electronic health records (EHRs). While many studies focus on the (primary) prevention of diseases, primordial prevention (preventing the factors that are known to increase the risk of a disease occurring) is still widely under-investigated. In this study, we propose a multi-target regression model leveraging transformers to learn the bidirectional representations of EHR data and predict the future values of 11 major modifiable risk factors of cardiovascular disease (CVD). Inspired by the proven results of pre-training in natural language processing studies, we apply the same principles on EHR data, dividing the training of our model into two phases: pre-training and fine-tuning. We u se t he fine-tuned transformer model in a 'multi-target regression' theme. Following this theme, we combine the 11 disjoint prediction tasks by adding shared and target-specific l ayers t o t he m odel and jointly train the entire model. We evaluate the performance of our proposed method on a large publicly available EHR dataset. Through various experiments, we demonstrate that the proposed method obtains a significant improvement (12.6% MAE on average across all 11 different outputs) over the baselines.
KW - cardiovascular disease
KW - electronic health records
KW - multi-target regression
KW - prevention
KW - transformers
UR - http://www.scopus.com/inward/record.url?scp=85125184620&partnerID=8YFLogxK
U2 - 10.1109/BIBM52615.2021.9669441
DO - 10.1109/BIBM52615.2021.9669441
M3 - Conference contribution
AN - SCOPUS:85125184620
T3 - Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
SP - 726
EP - 731
BT - Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
A2 - Huang, Yufei
A2 - Kurgan, Lukasz
A2 - Luo, Feng
A2 - Hu, Xiaohua Tony
A2 - Chen, Yidong
A2 - Dougherty, Edward
A2 - Kloczkowski, Andrzej
A2 - Li, Yaohang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Y2 - 9 December 2021 through 12 December 2021
ER -