TY - GEN
T1 - Stochastic neighbor compression
AU - Kusner, Matt J.
AU - Tyree, Stephen
AU - Weinberger, Kilian
AU - Agrawal, Kunal
N1 - Publisher Copyright:
Copyright © (2014) by the International Machine Learning Society (IMLS) All rights reserved.
PY - 2014
Y1 - 2014
N2 - We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of fc-nearest neighbor (fcNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up fcNN testing drastically (up to several orders of magnitude, in our experiments); it makes the fcNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than fcNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over fcNN even when fcNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction-demonstrating that it is complementary to existing state-of-the-art algorithms to speed up fcNN classification and leads to substantial further improvements.
AB - We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of fc-nearest neighbor (fcNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up fcNN testing drastically (up to several orders of magnitude, in our experiments); it makes the fcNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than fcNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over fcNN even when fcNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction-demonstrating that it is complementary to existing state-of-the-art algorithms to speed up fcNN classification and leads to substantial further improvements.
UR - https://www.scopus.com/pages/publications/84919880581
M3 - Conference contribution
AN - SCOPUS:84919880581
T3 - 31st International Conference on Machine Learning, ICML 2014
SP - 2051
EP - 2059
BT - 31st International Conference on Machine Learning, ICML 2014
PB - International Machine Learning Society (IMLS)
T2 - 31st International Conference on Machine Learning, ICML 2014
Y2 - 21 June 2014 through 26 June 2014
ER -