Multi-view classification with limited sample size and data augmentation is a very common machine learning (ML) problem in medicine. With limited data, a triplet network approach for two-stage representation learning has been proposed. However, effective training and verifying the features from the representation network for their suitability in subsequent classifiers are still unsolved problems. Although typical distance-based metrics for the training capture the overall class separability of the features, the performance according to these metrics does not always lead to an optimal classification. Consequently, an exhaustive tuning with all feature-classifier combinations is required to search for the best end result. To overcome this challenge, we developed a novel nearest-neighbor (NN) validation strategy based on the triplet metric. This strategy is supported by a theoretical foundation to provide the best selection of the features with a lower bound of the highest end performance. The proposed strategy is a transparent approach to identify whether to improve the features or the classifier. This avoids the need for repeated tuning. Our evaluations on real-world medical imaging tasks (i.e., radiation therapy delivery error prediction and sarcoma survival prediction) show that our strategy is superior to other common deep representation learning baselines [i.e., autoencoder (AE) and softmax]. The strategy addresses the issue of feature's interpretability which enables more holistic feature creation such that the medical experts can focus on specifying relevant data as opposed to tedious feature engineering.
|Journal||IEEE Transactions on Neural Networks and Learning Systems|
|State||Accepted/In press - 2021|
- Medical data classification
- multi-view learning
- representation learning
- transfer learning metric learning.