Papperoni

2012

Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition

Navdeep Jaitly, P. Nguyen, A. Senior, Vincent Vanhoucke

Open PDF Google Scholar

citations

Cite Score

17

AI summary

This paper introduces a new ASR system that uses DBN-pretrained ANN/HMM models, trained on large datasets of Voice Search and YouTube data, to outperform GMM/HMM baselines by 3.7% and 4.7% absolute WER, respectively, with further gains from MMI fine-tuning and SCARF model combination.

Main Contributions

Demonstrates that ANN/HMM hybrids pretrained with DBNs can outperform GMM/HMM systems in ASR.
Uses two large datasets (5780 hours of Voice Search and 1400 hours of YouTube data) to train and evaluate the models.
Achieves a 3.7% absolute WER improvement over the GMM/HMM baseline on the Voice Search dataset.
Achieves a 4.7% absolute WER improvement over the GMM/HMM baseline on the YouTube dataset.
Shows additional gains from MMI fine-tuning and model combination using SCARF.

Abstract

The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of YouTube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 4.7% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.5% and 0.9% absolute on the second dataset.

Citation Graph

Loading graph...

References [16]

Sort:

Filter:

[1]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[2]MapReduce: Simplified Data Processing on Large Clusters

Jeffrey Dean, Sanjay Ghemawat - 2004

4 papers in library cite

Amazing paper that discusses how MapReduce works! Very simple, and really nice to read something not related to AI. A shame that it's off-topic.

[3]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[4]Extracting and Composing Robust Features With Denoising Autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

I am *so* glad we found an alternative to DBNs. Also, introduced the idea of denoising which is nice.

[5]Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Good paper, very well written and probably the best explanation of RBMs and DBNs I've seen. However, I don't see a lot of impact and seems very derivative from other works.

[6]Improving the Speed of Neural Networks on Cpus

Vincent Vanhoucke, A. Senior, Mark Z. Mao - 2011

4 papers in library cite

It's good and it's interesting, but I don't think it adds a ton. Good read though. I read it because I thought it discussed quantization (which it does, but not in the sense of making NNs smaller)

[7]Acoustic Modeling Using Deep Belief Networks

A. Mohamed, G. Dahl, Geoffrey Hinton - 2012

12 papers in library cite

[8]Phone Recognition With the Mean-Covariance Restricted Boltzmann Machine

George E. Dahl, Marc'aurelio Ranzato, A. Mohamed, Geoffrey E. Hinton - 2010

6 papers in library cite

[9]CUDAMat: A CUDA-based Matrix Class for Python

V. Mnih - 2009

5 papers in library cite

[10]Boosted Mmi for Model and Feature-Space Discriminative Training

D. Povey, D. Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, G. Saon, K. Visweswariah - 2008

4 papers in library cite

[11]Conversational Speech Transcription Using Context-Dependent Deep Neural Networks

F. Seide, G. Li, D. Yu - 2011

4 papers in library cite

[12]Continuous Speech Recognition Using Multilayer Perceptrons With Hidden Markov Models

N. Morgan, H. Bourlard - 1990

3 papers in library cite

[13]Lattice-Based Optimization of Sequence Classification Criteria for Neural-Network Acoustic Modeling

Brian Kingsbury - 2009

3 papers in library cite

[14]Speech Recognition With Segmental Conditional Random Fields: A Summary of the JHU CLSP 2010 Summer Workshop

Geoffrey Zweig, P. Nguyen, D. V. Compernolle, K. Demuynck, L. Atlas, Peter Clark, G. Sell, Mingliang Wang, F. Sha, H. Hermansky, D. Karakos, A. Jansen, S. Thomas, G. S. V. S. Sivaram, S. Bowman, J. Kao - 2011

3 papers in library cite

[15]Deep Belief Networks Using Discriminative Features for Phone Recognition

A. Mohamed, T. N. Sainath, George E. Dahl, Bhuvana Ramabhadran, Geoffrey E. Hinton, M. Picheny - 2011

2 papers in library cite

[16]Semi-Tied Covariance Matrices for Hidden Markov Models

M. Gales - 1999

2 papers in library cite

Cited by

6

papers in your library

Cites

7

papers in your library

Read

on October 21, 2025

It's not bad, it's just nothing new really. They just get existing methods and apply to very large datasets. I see the contribution, but boring read - just experiment methodology and results.

Tags

Paper Aliases

No aliases