Forward Decoding Kernel Machines (FDKM) combine largemargin classifiers with Hidden Markov Models (HMM) for Maximum a Posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model, and forward decoding of the state transition probabilities with the sum-product algorithm directly produces the MAP sequence. The parameters in the probabilistic model are trained using a recursive scheme that maximizes a lower bound on the regularized cross-entropy. The recursion performs an expectation step on the outgoing state of the transition probability model, using the posterior probabilities produced by the previous maximization step. Similar to Expectation-Maximization (EM), the FDKM recursion deals effectively with noisy and partially labeled data. We also introduce a multi-class support vector machine for sparse conditional probability regression, GiniSVM based on a quadratic formulation of entropy. Experiments with benchmark classification data show that GiniSVM generalizes better than other multi-class SVM techniques. In conjunction with FDKM, GiniSVM produces a sparse kernel expansion of state transition probabilities, with drastically fewer non-zero coefficients than kernel logistic regression. Preliminary evaluation of FDKM with GiniSVM on a subset of the TIMIT speech database reveals significant improvements in phoneme recognition accuracy over other SVM and HMM techniques.