Abstract
The performance of speech recognition systems depends on consistent quality of the speech features across variable environmental conditions encountered during training and evaluation. This paper presents a kernel-based nonlinear predictive coding procedure that yields speech features which are robust to nonstationary noise contaminating the speech signal. Features maximally insensitive to additive noise are obtained by growth transformation of regression functions that span a reproducing kernel Hilbert space (RKHS). The features are normalized by construction and extract information pertaining to higher-order statistical correlations in the speech signal. Experiments with the TI-DIGIT database demonstrate consistent robustness to noise of varying statistics, yielding significant improvements in digit recognition accuracy over identical models trained using Mel-scale cepstral features and evaluated at noise levels between 0 and 30-dB signal-to-noise ratio.
Original language | English |
---|---|
Article number | 4276746 |
Pages (from-to) | 1842-1849 |
Number of pages | 8 |
Journal | IEEE Transactions on Audio, Speech and Language Processing |
Volume | 15 |
Issue number | 6 |
DOIs | |
State | Published - Aug 2007 |
Keywords
- Feature extraction
- Growth transforms
- Noise robustness
- Nonlinear signal processing
- Reproducing kernel Hilbert Space
- Speaker verification