A large-margin learning machine for sparse probability regression is presented. Unlike support vector machines and other forms of kernel machines, nonlinear features are obtained by transforming labels into higher-dimensional label space rather than transforming data vectors into feature space. Linear multi-class logistic regression with partitioned classes of labels yields a nonlinear classifier in the original labels. With a linear kernel in data space, storage and run-time requirements are reduced from the number of support vectors to the number of partitioned labels. Using the partitioning property of KL-divergence in label space, an iterative alignment procedure produces sparse training coefficients. Experiments show that label partitioning is effective in modeling nonlinear decision boundaries with same, and in some cases superior, generalization performance to Support Vector Machines with significantly reduced memory and run-time requirements.
|Number of pages||12|
|Journal||Lecture Notes in Computer Science|
|State||Published - 2003|
|Event||16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003 - Washington, DC, United States|
Duration: Aug 24 2003 → Aug 27 2003