New neural network for articulatory speech recognition and its application to vowel identification

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

A system for automatic speech recognition (ASR) based on a new neural network design and a theory of articulatory phonology is presented. This system operates in two stages. In the first, speech acoustics are mapped by a neural network onto the movements of the tongue and lips that produced those acoustics (the neural networks are trained on X-ray microbeam recordings of actual articulatory movements); in the second stage, gestures are recovered from those movements. The neural network is built around a new objective function, Correlational + Scaling Error (COSE). When compared to a traditional neural network system, the COSE system trains faster, produces output which better represents the shape of the articulatory movements, and yields higher recognition rates for vowel gestures. After training on two speakers, recognition rates up to 96% for tokens from the training set and 87% for tokens spoken by a novel speaker were achieved.

Original languageEnglish
Pages (from-to)189-209
Number of pages21
JournalComputer Speech and Language
Volume8
Issue number3
DOIs
StatePublished - Jul 1994

Fingerprint

Dive into the research topics of 'New neural network for articulatory speech recognition and its application to vowel identification'. Together they form a unique fingerprint.

Cite this