1989

Phoneme Recognition Using Time-Delay Neural Networks

A. H. Waibel, T. Hanazawa, Geoffrey Hinton, K. Shikano, K. Lang

citations

Cite Score

71

AI summary

This paper introduces a Time-Delay Neural Network (TDNN) for phoneme recognition, utilizing a 3-layer architecture and error backpropagation. The TDNN achieves a 98.5% recognition rate on the "B", "D", and "G" phonemes from a dataset of 1946 tokens, outperforming Hidden Markov Models.

Main Contributions

  • Introduces the Time-Delay Neural Network (TDNN) architecture for phoneme recognition.
  • Demonstrates the ability of TDNNs to learn acoustic-phonetic features independent of their temporal position.
  • Achieves a 98.5% recognition rate on the speaker-dependent recognition of phonemes "B", "D", and "G", surpassing Hidden Markov Models (HMMs) with a 93.7% accuracy on a dataset of 1946 tokens.
  • Shows TDNNs "invent" acoustic-phonetic features such as F2-rise, F2-fall, and vowel-onset as useful abstractions.
  • TDNN learns alternate internal representations to link different acoustic realizations to the same concept

Abstract

In this paper we present a Time-Delay Neural Network (TDNN) approach to phoneme recognition which is characterized by two important properties. 1) Using a 3 layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically using error backpropagation [1]. 2) The time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independent of position in time and hence not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes "B," "D," and "G" in varying phonetic contexts was chosen. For comparison, several discrete Hidden Markov Models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5 percent correct while the rate obtained by the best of our HMM's was only 93.7 percent. Closer inspection reveals that the network "invented" well-known acoustic-phonetic features (e.g., F2-rise, F2-fall, vowel-onset) as useful abstractions. It also developed alternate internal representations to link different acoustic realizations to the same concept.

Citation Graph

Loading graph...

References [44]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

46 papers in library cite

A. H. Waibel, T. Hanazawa, Geoffrey Hinton, K. Shikano, K. Lang - 1989

13 papers in library cite

Geoffrey E. Hinton - 1987

11 papers in library cite

M. Minsky, S. Papert - 1969

12 papers in library cite

Geoffrey E. Hinton, J. A. Anderson - 1981

4 papers in library cite

D. E. Rumelhart, J. L. Mcclelland, P. R. Group - 1986

15 papers in library cite

K. F. Lee, H. W. Hon - 1989

5 papers in library cite

L. R. Rabiner, B. Juang - 1986

2 papers in library cite

Y. Chow, M. Dunham, O. Kimball, M. Krasner, G. Kubala, John Makhoul, P. Price, S. Roucos, Richard Schwartz - 1987

2 papers in library cite

Frederick Jelinek - 1976

2 papers in library cite

P. Brown - 1987

2 papers in library cite

G. W. Cottrell - 1985

1 paper in library cites

J. L. Holt, J. N. Hwang - 1988

1 paper in library cites

R. P. Lippmann - 1987

1 paper in library cites

Missing year

T. Morimoto, K. Takeda, S. Katagiri

1 paper in library cites

K. N. Stevens, S. E. Blumstein - 1981

1 paper in library cites

W. Y. Huang, R. P. Lippmann - 1987

1 paper in library cites

D. Burr - 1986

1 paper in library cites

M. I. Posner, M. K. Marin, H. Remez - 1977

1 paper in library cites

Richard Schwartz, Y. Chow, O. Kimball, S. Roucos, M. Krasner, John Makhoul - 1985

1 paper in library cites

A. M. Derouault - 1987

1 paper in library cites

J. Bridle, R. Chamberlain, J. Scotland - 1983

1 paper in library cites

K. Shikano - 1985

1 paper in library cites

T. J. Sejnowski, C. R. Rosenberg - 1987

1 paper in library cites

V. W. Zue - 1981

1 paper in library cites

Y. Sagisaka, K. Takeda, S. Katagiri, H. Kuwabara - 1987

1 paper in library cites

T. Kohonen, K. Makisara, O. Simula - 1984

1 paper in library cites

Jeffrey L. Elman - 1988

1 paper in library cites

M. Sugiyama, K. Shikano - 1981

1 paper in library cites

V. W. Zue - 1980

1 paper in library cites

R. Plomp - 1976

1 paper in library cites

M. Jabri, J. Ayat - 1988

1 paper in library cites

R. Cole, L. Hirschman, L. Atlas, M. Beckman - 1983

1 paper in library cites

L. R. Rabiner, B. H. Juang, S. E. Levinson, M. M. Sondhi - 1985

1 paper in library cites

S. Minami, K. Nakayama - 1987

1 paper in library cites

W. H. Sit, R. J. Doolen - 1982

1 paper in library cites

L. R. Bahl, S. K. Das, P. V. D. Souza, Frederick Jelinek, S. Katz, R. L. Mercer, M. A. Picheny - 1984

1 paper in library cites

J. R. Searle - 1975

1 paper in library cites

A. Waibel - 1987

1 paper in library cites

M. A. Franzini - 1987

1 paper in library cites

K. Aikawa, K. Shikano - 1985

1 paper in library cites

J. K. Baker - 1975

1 paper in library cites

Cited by

13

papers in your library

Cites

7

papers in your library

Read

on June 24, 2025

Your review

Tags

Paper Aliases

No aliases