Papperoni

1989

Phoneme Recognition Using Time-Delay Neural Networks

A. H. Waibel, T. Hanazawa, Geoffrey Hinton, K. Shikano, K. Lang

Open PDF Google Scholar

citations

Cite Score

71

AI summary

This paper introduces a Time-Delay Neural Network (TDNN) for phoneme recognition, utilizing a 3-layer architecture and error backpropagation. The TDNN achieves a 98.5% recognition rate on the "B", "D", and "G" phonemes from a dataset of 1946 tokens, outperforming Hidden Markov Models.

Main Contributions

Introduces the Time-Delay Neural Network (TDNN) architecture for phoneme recognition.
Demonstrates the ability of TDNNs to learn acoustic-phonetic features independent of their temporal position.
Achieves a 98.5% recognition rate on the speaker-dependent recognition of phonemes "B", "D", and "G", surpassing Hidden Markov Models (HMMs) with a 93.7% accuracy on a dataset of 1946 tokens.
Shows TDNNs "invent" acoustic-phonetic features such as F2-rise, F2-fall, and vowel-onset as useful abstractions.
TDNN learns alternate internal representations to link different acoustic realizations to the same concept

Abstract

In this paper we present a Time-Delay Neural Network (TDNN) approach to phoneme recognition which is characterized by two important properties. 1) Using a 3 layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically using error backpropagation [1]. 2) The time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independent of position in time and hence not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes "B," "D," and "G" in varying phonetic contexts was chosen. For comparison, several discrete Hidden Markov Models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5 percent correct while the rate obtained by the best of our HMM's was only 93.7 percent. Closer inspection reveals that the network "invented" well-known acoustic-phonetic features (e.g., F2-rise, F2-fall, vowel-onset) as useful abstractions. It also developed alternate internal representations to link different acoustic realizations to the same concept.

Citation Graph

Loading graph...

References [44]

Sort:

Filter:

[1]Learning Representations by Back-Propagating Errors

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Introduced backprop. Short and simple.

[2]Learning Internal Representations by Error Propagation

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

I expected very little of this, but was so good in explaining concepts! Very good read. It gets a bit boring when it starts explaining things by the end of the chapter, but good nonetheless.

[3]Phoneme Recognition Using Time-Delay Neural Networks

A. H. Waibel, T. Hanazawa, Geoffrey Hinton, K. Shikano, K. Lang - 1989

13 papers in library cite

I didn't really like this paper, although it's a good early example of using backprop. I think it can be thought as a 1D convolution.

[4]Connectionist Learning Procedures

Geoffrey E. Hinton - 1987

11 papers in library cite

It's a very good overview of everything that was happening in 1987! A bit too long though, but a good start nonetheless.

M. Minsky, S. Papert - 1969

12 papers in library cite

Book, 500 pages

[6]Parallel Models of Associative Memory

Geoffrey E. Hinton, J. A. Anderson - 1981

4 papers in library cite

This is actually a book, I only found chapter 6 online

[7]Parallel Distributed Processing

D. E. Rumelhart, J. L. Mcclelland, P. R. Group - 1986

15 papers in library cite

[8]Speaker-Independent Phone Recognition Using Hidden Markov Models

K. F. Lee, H. W. Hon - 1989

5 papers in library cite

[9]An Introduction to Hidden Markov Models

L. R. Rabiner, B. Juang - 1986

2 papers in library cite

[10]Byblos: The BBN Continuous Speech Recognition System

Y. Chow, M. Dunham, O. Kimball, M. Krasner, G. Kubala, John Makhoul, P. Price, S. Roucos, Richard Schwartz - 1987

2 papers in library cite

[11]Continuous Speech Recognition by Statistical Methods

Frederick Jelinek - 1976

2 papers in library cite

[12]The Acoustic-Modeling Problem in Automatic Speech Recognition

P. Brown - 1987

2 papers in library cite

[13]A Connectionist Approach to Word Sense Disambiguation

G. W. Cottrell - 1985

1 paper in library cites

[14]A Neural Network Architecture Which Is Well Suited for VLSI Implementation

J. L. Holt, J. N. Hwang - 1988

1 paper in library cites

[15]An Introduction to Computing With Neural Nets

R. P. Lippmann - 1987

1 paper in library cites

Missing year

[16]ATR Speech Database

T. Morimoto, K. Takeda, S. Katagiri

1 paper in library cites

[17]Auditory Features in Consonant Discrimination

K. N. Stevens, S. E. Blumstein - 1981

1 paper in library cites

[18]Connected Speech Recognition Using a Neural Network

W. Y. Huang, R. P. Lippmann - 1987

1 paper in library cites

[19]Connectionist Speech Recognition

D. Burr - 1986

1 paper in library cites

[20]Constraints in Speech Perception

M. I. Posner, M. K. Marin, H. Remez - 1977

1 paper in library cites

[21]Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech

Richard Schwartz, Y. Chow, O. Kimball, S. Roucos, M. Krasner, John Makhoul - 1985

1 paper in library cites

[22]Context-Dependent Phonetic Markov Models for Large Vocabulary Speech Recognition

A. M. Derouault - 1987

1 paper in library cites

[23]Continuous Speech Recognition

J. Bridle, R. Chamberlain, J. Scotland - 1983

1 paper in library cites

[24]Continuous Speech Recognition by a Segmentation-Free, Large-Vocabulary Acoustic Processor

T. Kohonen - 1988

1 paper in library cites

[25]Evaluation of LPC Spectral Matching Measures for Phonetic Unit Recognition

K. Shikano - 1985

1 paper in library cites

[26]Generalization and Parameter Estimation in Connectionist Learning: Some Experiments on the Spelling of Speech

T. J. Sejnowski, C. R. Rosenberg - 1987

1 paper in library cites

[27]Invariant Acoustic Features in Vowels in Various Contexts

V. W. Zue - 1981

1 paper in library cites

[28]Japanese Speech Database With Fine Acoustic-Phonetic Transcriptions

Y. Sagisaka, K. Takeda, S. Katagiri, H. Kuwabara - 1987

1 paper in library cites

[29]Learning Phonetic Features Using Self-Organizing Vector Maps

T. Kohonen, K. Makisara, O. Simula - 1984

1 paper in library cites

[30]Learning to Predict the Next Word

Jeffrey L. Elman - 1988

1 paper in library cites

[31]LPC Peak Weighted Spectral Matching Measures

M. Sugiyama, K. Shikano - 1981

1 paper in library cites

[32]Perception and Energy Spectra for Vowels in Various Contexts

V. W. Zue - 1980

1 paper in library cites

[33]Perception of Complex Sounds

R. Plomp - 1976

1 paper in library cites

[34]Phonetic Recognition Experiments With Semi-Custom CMOS neural net Chips

M. Jabri, J. Ayat - 1988

1 paper in library cites

[35]Phonetic Typewriter

R. Cole, L. Hirschman, L. Atlas, M. Beckman - 1983

1 paper in library cites

[36]Recognition of Isolated Digits Using Hidden Markov Models With Continuous Mixture Densities

L. R. Rabiner, B. H. Juang, S. E. Levinson, M. M. Sondhi - 1985

1 paper in library cites

[37]Shift Invariance and Signal Prediction

S. Minami, K. Nakayama - 1987

1 paper in library cites

[38]Shift-Invariant Feature Detection

W. H. Sit, R. J. Doolen - 1982

1 paper in library cites

[39]Some Experiments With Large-Vocabulary Isolated-Word Sentence Recognition

L. R. Bahl, S. K. Das, P. V. D. Souza, Frederick Jelinek, S. Katz, R. L. Mercer, M. A. Picheny - 1984

1 paper in library cites

[40]Speech and Language

J. R. Searle - 1975

1 paper in library cites

[41]Speech Recognition

A. Waibel - 1987

1 paper in library cites

[42]Speech Recognition With Back Propagation

M. A. Franzini - 1987

1 paper in library cites

[43]Spoken Word Recognition Using Vector Quantization in Power-Spectrum Vector Space

K. Aikawa, K. Shikano - 1985

1 paper in library cites

[44]Stochastic Modeling as a Means of Automatic Speech Recognition

J. K. Baker - 1975

1 paper in library cites

Cited by

13

papers in your library

Cites

7

papers in your library

Read

on June 24, 2025

I didn't really like this paper, although it's a good early example of using backprop. I think it can be thought as a 1D convolution.

Tags

Paper Aliases

No aliases