2012

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Geoffrey Hinton

citations

Cite Score

88

AI summary

This paper reviews the use of Deep Neural Networks (DNNs) for acoustic modeling in speech recognition, demonstrating that DNNs outperform Gaussian Mixture Models (GMMs) on TIMIT and large vocabulary tasks, using new methods to train DNNs with multiple hidden layers.

Main Contributions

  • Introduces a two-stage training procedure for fitting DNNs: generative pre-training and discriminative fine-tuning.
  • Demonstrates that DNNs can outperform GMMs at acoustic modeling for speech recognition on a variety of datasets.
  • Shows that pre-training the DBN-DNN leads to the best results, but it is not critical, especially when using five or more hidden layers.
  • Details experiments on TIMIT database, Bing-Voice-Search, Switchboard, Youtube, and English-Broadcast-News.
  • Discusses alternative pre-training and fine-tuning methods for DNNs.

Abstract

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks with many hidden layers, that are trained using new methods have been shown to outperform Gaussian mixture models on a variety of speech recognition benchmarks, sometimes by a large margin. This paper provides an overview of this progress and represents the shared views of four research groups who have had recent successes in using deep neural networks for acoustic modeling in speech recognition.

Citation Graph

Loading graph...

References [68]

Sort:
Filter:

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Yoshua Bengio - 2010

20 papers in library cite

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

P. H. Vincent, Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre Antoine Manzagol - 2010

6 papers in library cite

Geoffrey Hinton - 2002

23 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, Yoshua Bengio - 2007

13 papers in library cite

James Martens - 2010

12 papers in library cite

Vincent Vanhoucke, A. Senior, Mark Z. Mao - 2011

4 papers in library cite

A. Robinson - 1994

9 papers in library cite

Navdeep Jaitly, P. Nguyen, A. Senior, Vincent Vanhoucke - 2012

6 papers in library cite

Dan C. Ciresan, Ueli Meier, Luca M. Gambardella, Jürgen Schmidhuber - 2010

10 papers in library cite

Geoffrey E. Hinton - 2010

4 papers in library cite

A. Mohamed, G. Dahl, Geoffrey Hinton - 2009

3 papers in library cite

L. Deng, D. Yu, J. Platt - 2012

2 papers in library cite

A. Mohamed, G. Dahl, Geoffrey Hinton - 2012

12 papers in library cite

H. Bourlard, N. Morgan - 1993

8 papers in library cite

George E. Dahl, Marc'aurelio Ranzato, A. Mohamed, Geoffrey E. Hinton - 2010

6 papers in library cite

D. Povey, D. Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, G. Saon, K. Visweswariah - 2008

4 papers in library cite

S. Rifai, Pascal Vincent, X. Muller, Xavier Glorot, Yoshua Bengio - 2011

4 papers in library cite

F. Seide, G. Li, D. Yu - 2011

4 papers in library cite

L. Bahl, P. Brown, P. D. Souza, R. Mercer - 1986

4 papers in library cite

Quoc V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Andrew Y. Ng - 2011

4 papers in library cite

O. A. Hamid, A. Mohamed, H. Jiang, G. Penn - 2012

3 papers in library cite

T. N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran - 2012

3 papers in library cite

A. Mohamed, D. Yu, L. Deng - 2010

3 papers in library cite

J. Pearl - 1988

3 papers in library cite

D. Yu, L. Deng, G. Dahl - 2010

3 papers in library cite

Geoffrey Zweig, P. Nguyen, D. V. Compernolle, K. Demuynck, L. Atlas, Peter Clark, G. Sell, Mingliang Wang, F. Sha, H. Hermansky, D. Karakos, A. Jansen, S. Thomas, G. S. V. S. Sivaram, S. Bowman, J. Kao - 2011

3 papers in library cite

L. Deng - 1999

2 papers in library cite

A. Mohamed, T. N. Sainath, George E. Dahl, Bhuvana Ramabhadran, Geoffrey E. Hinton, M. Picheny - 2011

2 papers in library cite

L. Deng, D. Yu - 2011

2 papers in library cite

F. Seide, G. Li, X. Chen, D. Yu - 2011

2 papers in library cite

Yoshua Bengio, R. D. Mori, G. Flammia, F. Kompe - 1991

2 papers in library cite

A. K. Halberstadt - 1998

2 papers in library cite

C. Plahl, T. N. Sainath, Bhuvana Ramabhadran, D. Nahamoo - 2012

2 papers in library cite

T. N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran, P. Fousek, P. Novak, A. Mohamed - 2011

2 papers in library cite

J. Baker, L. Deng, J. Glass, Sanjeev Khudanpur, C. Lee, N. Morgan, D. O'shaugnessy - 2009

2 papers in library cite

Y. Hifny, S. Renals - 2009

2 papers in library cite

L. Deng, D. Yu, Alex Acero - 2006

2 papers in library cite

H. Hermansky, D. P. W. Ellis, S. S. Sharma - 2000

2 papers in library cite

A. Mohamed, Geoffrey Hinton, G. Penn - 2012

2 papers in library cite

Honglak Lee, P. Pham, Y. Largman, A. Ng - 2009

2 papers in library cite

B. Hutchinson, L. Deng, D. Yu - 2012

1 paper in library cites

T. N. Sainath, Bhuvana Ramabhadran, M. Picheny - 2009

1 paper in library cites

R. Prabhavalkar, E. F. Lussier - 2010

1 paper in library cites

D. Yu, S. Siniscalchi, L. Deng, C. Lee - 2012

1 paper in library cites

S. Furui - 1981

1 paper in library cites

Oriol Vinyals, S. V. Ravuri - 2011

1 paper in library cites

N. Morgan - 2012

1 paper in library cites

S. Furui - 2000

1 paper in library cites

D. Yu, L. Deng, G. Li, F. Seide - 2011

1 paper in library cites

T. N. Sainath, Bhuvana Ramabhadran, M. Picheny, D. Nahamoo, D. Kanevsky - 2011

1 paper in library cites

J. Ming, F. J. Smith - 1998

1 paper in library cites

P. C. Woodland, D. Povey - 2002

1 paper in library cites

S. Young - 1996

1 paper in library cites

B. H. Juang, S. Levinson, M. Sondhi - 1986

1 paper in library cites

H. Hermansky - 1990

1 paper in library cites

H. Zen, M. Gales, Y. Nankaku, K. Tokuda - 2012

1 paper in library cites

N. Morgan, Qihao Zhu, Andreas Stolcke, K. Sonmez, S. Sivadas, T. Shinozaki, M. Ostendorf, P. Jain, H. Hermansky, D. Ellis, G. Doddington, Berlin Chen, O. Cretin, H. Bourlard, M. Athineos - 2005

1 paper in library cites

G. S. V. S. Sivaram, H. Hermansky - 2012

1 paper in library cites

L. Deng - 2003

1 paper in library cites

Cited by

21

papers in your library

Cites

17

papers in your library

Read

on July 18, 2025

Your review

Tags

Paper Aliases

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups