TY - JOUR
T1 - Ginisupport vector machines for segmental minimum Bayes risk decoding of continuous speech
AU - Venkataramani, Veera
AU - Chakrabartty, Shantanu
AU - Byrne, William
PY - 2007/7
Y1 - 2007/7
N2 - We describe the use of support vector machines (SVMs) for continuous speech recognition by incorporating them in segmental minimum Bayes risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pose the estimation of a posterior distribution over hypotheses in these regions of acoustic confusion as a logistic regression problem. We also show that GiniSVMs can be used as an approximation technique to estimate the parameters of the logistic regression problem. On a small vocabulary recognition task we show that the use of GiniSVMs can improve the performance of a well trained hidden Markov model system trained under the Maximum Mutual Information criterion. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We discuss the problems that we expect to encounter in extending this approach to large vocabulary continuous speech recognition and describe initial investigation of constrained estimation techniques to derive feature spaces for SVMs.
AB - We describe the use of support vector machines (SVMs) for continuous speech recognition by incorporating them in segmental minimum Bayes risk decoding. Lattice cutting is used to convert the Automatic Speech Recognition search space into sequences of smaller recognition problems. SVMs are then trained as discriminative models over each of these problems and used in a rescoring framework. We pose the estimation of a posterior distribution over hypotheses in these regions of acoustic confusion as a logistic regression problem. We also show that GiniSVMs can be used as an approximation technique to estimate the parameters of the logistic regression problem. On a small vocabulary recognition task we show that the use of GiniSVMs can improve the performance of a well trained hidden Markov model system trained under the Maximum Mutual Information criterion. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We discuss the problems that we expect to encounter in extending this approach to large vocabulary continuous speech recognition and describe initial investigation of constrained estimation techniques to derive feature spaces for SVMs.
UR - http://www.scopus.com/inward/record.url?scp=33847611644&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2006.08.002
DO - 10.1016/j.csl.2006.08.002
M3 - Article
AN - SCOPUS:33847611644
SN - 0885-2308
VL - 21
SP - 423
EP - 442
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 3
ER -