2013

On Rectified Linear Units for Speech Processing

Matthew D. Zeiler, M. A. Ranzato, R. Monga, M. Mao, K. Yang, Quoc Le, P. Nguyen, A. Senior, Vincent Vanhoucke, Jeffrey Dean

citations

Cite Score

31

AI summary

This paper introduces Hinge Deep Neural Networks (HDNNs) using Rectified Linear Units (ReLUs) for speech processing, demonstrating faster convergence, better generalization, and lower word error rates compared to traditional logistic networks, achieving state-of-the-art results on a large vocabulary speech recognition task with distributed training.

Main Contributions

  • Proposes the use of Rectified Linear Units (ReLUs) in deep neural networks for speech processing.
  • Demonstrates that HDNNs can be trained from random initialization without unsupervised pre-training.
  • Shows that HDNNs converge faster and generalize better than logistic networks.
  • Achieves lower word error rates on a large vocabulary speech recognition task using HDNNs.
  • Introduces a sparse autoencoder method for unsupervised feature learning using ReLUs.

Abstract

Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems. The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function. In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. These units are linear when their input is positive and zero otherwise. In a supervised setting, we can successfully train very deep nets from random initialization on a large vocabulary speech recognition task achieving lower word error rates than using a logistic network with the same topology. Similarly in an unsupervised setting, we show how we can learn sparse features that can be useful for discriminative tasks. All our experiments are executed in a distributed environment using several hundred machines and several hundred hours of speech data.

Citation Graph

Loading graph...

References [16]

Sort:
Filter:

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

John Duchi, Elad Hazan, Yoram Singer - 2011

19 papers in library cite

Xavier Glorot, Antoine Bordes, Yoshua Bengio - 2011

17 papers in library cite

Geoffrey Hinton - 2002

23 papers in library cite

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012

16 papers in library cite

Navdeep Jaitly, P. Nguyen, A. Senior, Vincent Vanhoucke - 2012

6 papers in library cite

A. Mohamed, G. Dahl, Geoffrey Hinton - 2009

3 papers in library cite

T. N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran - 2012

3 papers in library cite

Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2008

3 papers in library cite

K. Gregor, Yann Lecun - 2010

3 papers in library cite

Brian Kingsbury, T. N. Sainath, H. Soltau - 2012

3 papers in library cite

C. Plahl, T. N. Sainath, Bhuvana Ramabhadran, D. Nahamoo - 2012

2 papers in library cite

L. Deng, B. Hutchinson, D. Yu - 2012

1 paper in library cites

C. J. Rozell, D. H. Johnson, R. G. Baraniuk, Bruno A. Olshausen - 2008

1 paper in library cites

Marc'aurelio Ranzato - 2009

1 paper in library cites

Cited by

3

papers in your library

Cites

8

papers in your library

Read

on August 3, 2025

Your review

Tags

Paper Aliases

No aliases