TY - JOUR
T1 - Synergizing Deep Learning-Enabled Preprocessing and Human–AI Integration for Efficient Automatic Ground Truth Generation
AU - Collazo, Christopher
AU - Vargas, Ian
AU - Cara, Brendon
AU - Weinheimer, Carla J.
AU - Grabau, Ryan P.
AU - Goldgof, Dmitry
AU - Hall, Lawrence
AU - Wickline, Samuel A.
AU - Pan, Hua
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/5
Y1 - 2024/5
N2 - The progress of incorporating deep learning in the field of medical image interpretation has been greatly hindered due to the tremendous cost and time associated with generating ground truth for supervised machine learning, alongside concerns about the inconsistent quality of images acquired. Active learning offers a potential solution to these problems of expanding dataset ground truth by algorithmically choosing the most informative samples for ground truth labeling. Still, this effort incurs the costs of human labeling, which needs minimization. Furthermore, automatic labeling approaches employing active learning often exhibit overfitting tendencies while selecting samples closely aligned with the training set distribution and excluding out-of-distribution samples, which could potentially improve the model’s effectiveness. We propose that the majority of out-of-distribution instances can be attributed to inconsistent cross images. Since the FDA approved the first whole-slide image system for medical diagnosis in 2017, whole-slide images have provided enriched critical information to advance the field of automated histopathology. Here, we exemplify the benefits of a novel deep learning strategy that utilizes high-resolution whole-slide microscopic images. We quantitatively assess and visually highlight the inconsistencies within the whole-slide image dataset employed in this study. Accordingly, we introduce a deep learning-based preprocessing algorithm designed to normalize unknown samples to the training set distribution, effectively mitigating the overfitting issue. Consequently, our approach significantly increases the amount of automatic region-of-interest ground truth labeling on high-resolution whole-slide images using active deep learning. We accept 92% of the automatic labels generated for our unlabeled data cohort, expanding the labeled dataset by 845%. Additionally, we demonstrate expert time savings of 96% relative to manual expert ground-truth labeling.
AB - The progress of incorporating deep learning in the field of medical image interpretation has been greatly hindered due to the tremendous cost and time associated with generating ground truth for supervised machine learning, alongside concerns about the inconsistent quality of images acquired. Active learning offers a potential solution to these problems of expanding dataset ground truth by algorithmically choosing the most informative samples for ground truth labeling. Still, this effort incurs the costs of human labeling, which needs minimization. Furthermore, automatic labeling approaches employing active learning often exhibit overfitting tendencies while selecting samples closely aligned with the training set distribution and excluding out-of-distribution samples, which could potentially improve the model’s effectiveness. We propose that the majority of out-of-distribution instances can be attributed to inconsistent cross images. Since the FDA approved the first whole-slide image system for medical diagnosis in 2017, whole-slide images have provided enriched critical information to advance the field of automated histopathology. Here, we exemplify the benefits of a novel deep learning strategy that utilizes high-resolution whole-slide microscopic images. We quantitatively assess and visually highlight the inconsistencies within the whole-slide image dataset employed in this study. Accordingly, we introduce a deep learning-based preprocessing algorithm designed to normalize unknown samples to the training set distribution, effectively mitigating the overfitting issue. Consequently, our approach significantly increases the amount of automatic region-of-interest ground truth labeling on high-resolution whole-slide images using active deep learning. We accept 92% of the automatic labels generated for our unlabeled data cohort, expanding the labeled dataset by 845%. Additionally, we demonstrate expert time savings of 96% relative to manual expert ground-truth labeling.
KW - active deep learning
KW - convolutional neural network
KW - deep learning-based preprocessing
KW - ground truth
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85194401205&partnerID=8YFLogxK
U2 - 10.3390/bioengineering11050434
DO - 10.3390/bioengineering11050434
M3 - Article
C2 - 38790302
AN - SCOPUS:85194401205
SN - 2306-5354
VL - 11
JO - Bioengineering
JF - Bioengineering
IS - 5
M1 - 434
ER -