TY - JOUR
T1 - Sparse Auditory Reproducing Kernel (SPARK) features for noise-robust speech recognition
AU - Fazel, Amin
AU - Chakrabartty, Shantanu
N1 - Funding Information:
received the B.Tech. degree from the Indian Institute of Technology, Delhi, in 1996 and the M.S. and Ph.D. degrees in electrical engineering from Johns Hopkins University, Baltimore, MD, in 2002 and 2005, respectively. He is currently an Associate Professor in the De-partment of Electrical and Computer Engineering, Michigan State University (MSU), East Lansing. From 1996 to 1999, he was with Qualcomm, Inc., San Diego, CA, and during 2002 he was a Visiting Researcher at The University of Tokyo. His work covers different aspects of analog computing, in particular nonvolatile circuits, and his current research interests include energy harvesting sensors and neuromorphic and hybrid circuits and systems. Dr. Chakrabartty was a Catalyst foundation fellow from 1999 to 2004 and is a recipient of a National Science Foundation’s CAREER award and University Teacher-Scholar Award from MSU. He is currently serving as an Associate Editor for the IEEE TRANSACTIONS OF BIOMEDICAL CIRCUITS AND SYSTEMS, Associate Editor for the Advances in Artificial Neural Systems journal and a Review Editor for Frontiers of Neuromorphic Engineering journal.
PY - 2012
Y1 - 2012
N2 - In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as Sparse Auditory Reproducing Kernel (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (MAX operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.
AB - In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as Sparse Auditory Reproducing Kernel (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (MAX operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.
KW - Auditory HMAX
KW - gammatone functions
KW - reproducing kernel Hilbert space (RKHS)
KW - robust speech recognition
KW - sparse features
UR - http://www.scopus.com/inward/record.url?scp=84857464869&partnerID=8YFLogxK
U2 - 10.1109/TASL.2011.2179294
DO - 10.1109/TASL.2011.2179294
M3 - Article
AN - SCOPUS:84857464869
SN - 1558-7916
VL - 20
SP - 1362
EP - 1371
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 4
M1 - 6099594
ER -