TY - JOUR
T1 - Contrastive Cross-Modal Pre-Training
T2 - A General Strategy for Small Sample Medical Imaging
AU - Liang, Gongbo
AU - Greenwell, Connor
AU - Zhang, Yu
AU - Xing, Xin
AU - Wang, Xiaoqin
AU - Kavuluru, Ramakanth
AU - Jacobs, Nathan
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022/4/1
Y1 - 2022/4/1
N2 - A key challenge in training neural networks for a given medical imaging task is the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports are often readily available in medical records and contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%-98%.
AB - A key challenge in training neural networks for a given medical imaging task is the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports are often readily available in medical records and contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%-98%.
KW - Annotation-efficient modeling
KW - convolutional neural network
KW - pre-training
KW - text-image matching
UR - http://www.scopus.com/inward/record.url?scp=85114717993&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2021.3110805
DO - 10.1109/JBHI.2021.3110805
M3 - Article
C2 - 34495856
AN - SCOPUS:85114717993
SN - 2168-2194
VL - 26
SP - 1640
EP - 1649
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 4
ER -