Harmonizing 10,000 connectomes: site-invariant representation learning for multi-site analysis of network connectivity and cognitive impairment

  • Nancy R. Newlin
  • , Michael E. Kim
  • , Praitayini Kanakaraj
  • , Elyssa McMaster
  • , Chloe Cho
  • , Chenyu Gao
  • , Timothy J. Hohman
  • , Lori Beason-Held
  • , Susan M. Resnick
  • , Sid E. O’Bryant
  • , Nicole Phillips
  • , Robert C. Barber
  • , David A. Bennett
  • , Lisa L. Barnes
  • , Sarah Biber
  • , Sterling Johnson
  • , Derek Archer
  • , Zhiyuan Li
  • , Lianrui Zuo
  • , Daniel Moyer
  • Bennett A. Landman

Research output: Contribution to journalArticlepeer-review

Abstract

Purpose: Data-driven harmonization can mitigate systematic confounding signals across imaging cohorts caused by variance in scanners and acquisition protocols. As diffusion magnetic resonance imaging data are often acquired with different hardware and software, harmonization is essential for integrating these scattered datasets into a cohesive analysis for improved statistical power. Large-scale, multi-site studies for Alzheimer’s disease (AD), a neurodegenerative condition characterized by high data variability and complex pathology, pose the challenge of both site-based and biological variation. Approach: We learn lower-dimensional representations of structural connectivity invariant to imaging cohort, geographical location, scanner, and acquisition factors. We design a conditional variational autoencoder that creates latent representations with minimal information about imaging factors and maximal information related to patient cognitive status. With this model, we consolidate 9 cohorts and 35 unique imaging acquisitions (for a total of 38 imaging “sites”) into a cohesive dataset of 6956 persons (16.4% with mild cognitive impairment and 10.7% with AD) imaged for 1 to 16 sessions for a total of 11,927 diffusion-weighted imaging sessions. Results: These site-invariant representations successfully remove significant (p < 0.05) site effects in 12 network connectivity measures of interest and enhance the prediction of cognitive diagnosis (from 68% accuracy to 73% accuracy). Conclusions: The proposed model yields reproducible precision across 15 data configurations. This approach demonstrates the effectiveness of representation learning in enhancing biological signals by mitigating acquisition-specific confounding factors in neuroimaging studies.

Original languageEnglish
Article number064001
JournalJournal of Medical Imaging
Volume12
Issue number6
DOIs
StatePublished - Nov 1 2025

Keywords

  • brain networks
  • connectomics
  • diffusion imaging
  • machine learning
  • multi-site

Fingerprint

Dive into the research topics of 'Harmonizing 10,000 connectomes: site-invariant representation learning for multi-site analysis of network connectivity and cognitive impairment'. Together they form a unique fingerprint.

Cite this