Sparse Auditory Reproducing Kernel (SPARK) features for noise-robust speech recognition

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as Sparse Auditory Reproducing Kernel (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (MAX operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.

Original languageEnglish
Article number6099594
Pages (from-to)1362-1371
Number of pages10
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume20
Issue number4
DOIs
StatePublished - 2012

Keywords

  • Auditory HMAX
  • gammatone functions
  • reproducing kernel Hilbert space (RKHS)
  • robust speech recognition
  • sparse features

Fingerprint

Dive into the research topics of 'Sparse Auditory Reproducing Kernel (SPARK) features for noise-robust speech recognition'. Together they form a unique fingerprint.

Cite this