Abstract
Purpose: Data-driven harmonization can mitigate systematic confounding signals across imaging cohorts caused by variance in scanners and acquisition protocols. As diffusion magnetic resonance imaging data are often acquired with different hardware and software, harmonization is essential for integrating these scattered datasets into a cohesive analysis for improved statistical power. Large-scale, multi-site studies for Alzheimer’s disease (AD), a neurodegenerative condition characterized by high data variability and complex pathology, pose the challenge of both site-based and biological variation. Approach: We learn lower-dimensional representations of structural connectivity invariant to imaging cohort, geographical location, scanner, and acquisition factors. We design a conditional variational autoencoder that creates latent representations with minimal information about imaging factors and maximal information related to patient cognitive status. With this model, we consolidate 9 cohorts and 35 unique imaging acquisitions (for a total of 38 imaging “sites”) into a cohesive dataset of 6956 persons (16.4% with mild cognitive impairment and 10.7% with AD) imaged for 1 to 16 sessions for a total of 11,927 diffusion-weighted imaging sessions. Results: These site-invariant representations successfully remove significant (p < 0.05) site effects in 12 network connectivity measures of interest and enhance the prediction of cognitive diagnosis (from 68% accuracy to 73% accuracy). Conclusions: The proposed model yields reproducible precision across 15 data configurations. This approach demonstrates the effectiveness of representation learning in enhancing biological signals by mitigating acquisition-specific confounding factors in neuroimaging studies.
| Original language | English |
|---|---|
| Article number | 064001 |
| Journal | Journal of Medical Imaging |
| Volume | 12 |
| Issue number | 6 |
| DOIs | |
| State | Published - Nov 1 2025 |
Keywords
- brain networks
- connectomics
- diffusion imaging
- machine learning
- multi-site