Clinically acquired, multimodal and multi-site MRI datasets are widely used for neuro-oncology research. However, manual preprocessing of such data is extremely tedious and error prone due to high intrinsic heterogeneity. Automatic standardization of such datasets is therefore important for data-hungry applications like deep learning. Despite rapid advances in MRI data acquisition and processing algorithms, only limited effort was dedicated to automatic methodologies for standardization of such data. To address this challenge, we augment our previously developed Multimodal Glioma Analysis (MGA) pipeline with automation tools to achieve processing scale suitable for big data applications. This new pipeline implements a natural language processing (NLP) based scan-type classifier, with features constructed from DICOM metadata based on bag-ofwords model. The classifier automatically assigns one of 18 pre-defined scan types to all scans in MRI study. Using the described data model, we trained three types of classifiers: logistic regression, linear SVM, and multi-layer artificial neural network (ANN) on the same dataset. Their performance was validated on four datasets from multiple sources. ANN implementation achieved the highest performance, yielding an average classification accuracy of over 99%. We also built a Jupyter notebook based graphical user interface (GUI) which is used to run MGA in semi-automatic mode for progress tracking purposes and quality control to ensure reproducibility of the analyses based thereof. MGA has been implemented as a Docker container image to ensure portability and easy deployment. The application can run in a single or batch study mode, using either local DICOM data or XNAT cloud storage.

Original languageEnglish
Title of host publicationMedical Imaging 2020
Subtitle of host publicationImaging Informatics for Healthcare, Research, and Applications
EditorsPo-Hao Chen, Thomas M. Deserno
ISBN (Electronic)9781510634039
StatePublished - 2020
EventMedical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications - Houston, United States
Duration: Feb 16 2020Feb 17 2020

Publication series

NameProgress in Biomedical Optics and Imaging - Proceedings of SPIE
ISSN (Print)1605-7422


ConferenceMedical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications
Country/TerritoryUnited States


  • Clinical MRI
  • Docker
  • Jupyter Notebook
  • MRI scan classifier
  • Natural Language Processing
  • Neuro-oncology imaging
  • Translational research


Dive into the research topics of 'Preprocessing of clinical neuro-oncology MRI studies for big data applications'. Together they form a unique fingerprint.

Cite this