Stochastic neighbor compression

  • Matt J. Kusner
  • , Stephen Tyree
  • , Kilian Weinberger
  • , Kunal Agrawal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of fc-nearest neighbor (fcNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up fcNN testing drastically (up to several orders of magnitude, in our experiments); it makes the fcNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than fcNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over fcNN even when fcNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction-demonstrating that it is complementary to existing state-of-the-art algorithms to speed up fcNN classification and leads to substantial further improvements.

Original languageEnglish
Title of host publication31st International Conference on Machine Learning, ICML 2014
PublisherInternational Machine Learning Society (IMLS)
Pages2051-2059
Number of pages9
ISBN (Electronic)9781634393973
StatePublished - 2014
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: Jun 21 2014Jun 26 2014

Publication series

Name31st International Conference on Machine Learning, ICML 2014
Volume3

Conference

Conference31st International Conference on Machine Learning, ICML 2014
Country/TerritoryChina
CityBeijing
Period06/21/1406/26/14

Fingerprint

Dive into the research topics of 'Stochastic neighbor compression'. Together they form a unique fingerprint.

Cite this