TY - GEN
T1 - Correlative hierarchical clustering-based low-rank dimensionality reduction of radiomics-driven phenotype in non-small cell lung cancer
AU - Yousefi, Bardia
AU - Jahani, Nariman
AU - Lariviere, Michael J.
AU - Cohen, Eric
AU - Hsieh, Meng Kang
AU - Luna, José Marcio
AU - Chitalia, Rhea D.
AU - Thompson, Jeffrey C.
AU - Carpenter, Erica L.
AU - Katz, Sharyn I.
AU - Kontos, Despina
N1 - Funding Information:
Research reported in this presentation was supported by the National Cancer Institute of National Institutes of Health under Award UM1CA221939. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© 2019 SPIE.
PY - 2019
Y1 - 2019
N2 - Background: Lung cancer is one of the most common cancers in the United States and the most fatal, with 142,670 deaths in 2019. Accurately determining tumor response is critical to clinical treatment decisions, ultimately impacting patient survival. To better differentiate between non-small cell lung cancer (NSCLC) responders and non-responders to therapy, radiomic analysis is emerging as a promising approach to identify associated imaging features undetectable by the human eye. However, the plethora of variables extracted from an image may actually undermine the performance of computer-Aided prognostic assessment, known as the curse of dimensionality. In the present study, we show that correlative-driven hierarchical clustering improves high-dimensional radiomics-based feature selection and dimensionality reduction, ultimately predicting overall survival in NSCLC patients. Methods: To select features for high-dimensional radiomics data, a correlation-incorporated hierarchical clustering algorithm automatically categorizes features into several groups. The truncation distance in the resulting dendrogram graph is used to control the categorization of the features, initiating low-rank dimensionality reduction in each cluster, and providing descriptive features for Cox proportional hazards (CPH)-based survival analysis. Using a publicly available non-NSCLC radiogenomic dataset of 204 patients' CT images, 429 established radiomics features were extracted. Low-rank dimensionality reduction via principal component analysis (PCA) was employed (o= o, o < o) to find the representative components of each cluster of features and calculate cluster robustness using the relative weighted consistency metric. Results: Hierarchical clustering categorized radiomic features into several groups without primary initialization of cluster numbers using the correlation distance metric (as a function) to truncate the resulting dendrogram into different distances. The dimensionality was reduced from 429 to 67 features (for truncation distance of 0.1). The robustness within the features in clusters was varied from-1.12 to-30.02 for truncation distances of 0.1 to 1.8, respectively, which indicated that the robustness decreases with increasing truncation distance when smaller number of feature classes (i.e., clusters) are selected. The best multivariate CPH survival model had a C-statistic of 0.71 for truncation distance of 0.1, outperforming conventional PCA approaches by 0.04, even when the same number of principal components was considered for feature dimensionality. Conclusions: Correlative hierarchical clustering algorithm truncation distance is directly associated with robustness of the clusters of features selected and can effectively reduce feature dimensionality while improving outcome prediction.
AB - Background: Lung cancer is one of the most common cancers in the United States and the most fatal, with 142,670 deaths in 2019. Accurately determining tumor response is critical to clinical treatment decisions, ultimately impacting patient survival. To better differentiate between non-small cell lung cancer (NSCLC) responders and non-responders to therapy, radiomic analysis is emerging as a promising approach to identify associated imaging features undetectable by the human eye. However, the plethora of variables extracted from an image may actually undermine the performance of computer-Aided prognostic assessment, known as the curse of dimensionality. In the present study, we show that correlative-driven hierarchical clustering improves high-dimensional radiomics-based feature selection and dimensionality reduction, ultimately predicting overall survival in NSCLC patients. Methods: To select features for high-dimensional radiomics data, a correlation-incorporated hierarchical clustering algorithm automatically categorizes features into several groups. The truncation distance in the resulting dendrogram graph is used to control the categorization of the features, initiating low-rank dimensionality reduction in each cluster, and providing descriptive features for Cox proportional hazards (CPH)-based survival analysis. Using a publicly available non-NSCLC radiogenomic dataset of 204 patients' CT images, 429 established radiomics features were extracted. Low-rank dimensionality reduction via principal component analysis (PCA) was employed (o= o, o < o) to find the representative components of each cluster of features and calculate cluster robustness using the relative weighted consistency metric. Results: Hierarchical clustering categorized radiomic features into several groups without primary initialization of cluster numbers using the correlation distance metric (as a function) to truncate the resulting dendrogram into different distances. The dimensionality was reduced from 429 to 67 features (for truncation distance of 0.1). The robustness within the features in clusters was varied from-1.12 to-30.02 for truncation distances of 0.1 to 1.8, respectively, which indicated that the robustness decreases with increasing truncation distance when smaller number of feature classes (i.e., clusters) are selected. The best multivariate CPH survival model had a C-statistic of 0.71 for truncation distance of 0.1, outperforming conventional PCA approaches by 0.04, even when the same number of principal components was considered for feature dimensionality. Conclusions: Correlative hierarchical clustering algorithm truncation distance is directly associated with robustness of the clusters of features selected and can effectively reduce feature dimensionality while improving outcome prediction.
KW - Cox proportional hazard (CPH) model
KW - Dimensionality reduction
KW - Feature robustness
KW - Feature selection
KW - Hierarchical clustering
KW - Non-small cell lung cancer
KW - Radiomic features
KW - Survival analysis.
UR - http://www.scopus.com/inward/record.url?scp=85068571854&partnerID=8YFLogxK
U2 - 10.1117/12.2515609
DO - 10.1117/12.2515609
M3 - Conference contribution
AN - SCOPUS:85068571854
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2019
A2 - Chen, Po-Hao
A2 - Bak, Peter R.
PB - SPIE
T2 - Medical Imaging 2019: Imaging Informatics for Healthcare, Research, and Applications
Y2 - 17 February 2019 through 18 February 2019
ER -