Papperoni

2013

On Rectified Linear Units for Speech Processing

Matthew D. Zeiler, M. A. Ranzato, R. Monga, M. Mao, K. Yang, Quoc Le, P. Nguyen, A. Senior, Vincent Vanhoucke, Jeffrey Dean

Open PDF Google Scholar

citations

Cite Score

31

AI summary

This paper introduces Hinge Deep Neural Networks (HDNNs) using Rectified Linear Units (ReLUs) for speech processing, demonstrating faster convergence, better generalization, and lower word error rates compared to traditional logistic networks, achieving state-of-the-art results on a large vocabulary speech recognition task with distributed training.

Main Contributions

Proposes the use of Rectified Linear Units (ReLUs) in deep neural networks for speech processing.
Demonstrates that HDNNs can be trained from random initialization without unsupervised pre-training.
Shows that HDNNs converge faster and generalize better than logistic networks.
Achieves lower word error rates on a large vocabulary speech recognition task using HDNNs.
Introduces a sparse autoencoder method for unsupervised feature learning using ReLUs.

Abstract

Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems. The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function. In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. These units are linear when their input is positive and zero otherwise. In a supervised setting, we can successfully train very deep nets from random initialization on a large vocabulary speech recognition task achieving lower word error rates than using a logistic network with the same topology. Similarly in an unsupervised setting, we show how we can learn sparse features that can be useful for discriminative tasks. All our experiments are executed in a distributed environment using several hundred machines and several hundred hours of speech data.

Citation Graph

Loading graph...

References [16]

Sort:

Filter:

[1]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[2]Rectified Linear Units Improve Restricted Boltzmann Machines

V. Nair, Geoffrey E. Hinton - 2010

18 papers in library cite

I hate when people introduce a new idea but don't care to explain it! This is terrible compared to bengio's paper.

[3]Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John Duchi, Elad Hazan, Yoram Singer - 2011

19 papers in library cite

I actually skimmed through most of this. It's not a bad paper, but it's a math paper, not AI.

[4]Deep Sparse Rectifier neural Networks

Xavier Glorot, Antoine Bordes, Yoshua Bengio - 2011

17 papers in library cite

How can you not love Bengio? This paper is everything that Hinton's is not. Really well explained, surprising results, good comparisons... Amazing impact!

[5]Training Products of Experts by Minimizing Contrastive Divergence

Geoffrey Hinton - 2002

23 papers in library cite

Good read, but I think I need to revisit it after I understand RBMs better.

[6]Large Scale Distributed Deep Networks

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012

16 papers in library cite

Good paper, nice algorithm. Nothing too crazy, but I understand the impact. I think the work to create the system was larger than the algorithm itself.

[7]Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition

Navdeep Jaitly, P. Nguyen, A. Senior, Vincent Vanhoucke - 2012

6 papers in library cite

It's not bad, it's just nothing new really. They just get existing methods and apply to very large datasets. I see the contribution, but boring read - just experiment methodology and results.

[8]Deep Belief Networks for Phone Recognition

A. Mohamed, G. Dahl, Geoffrey Hinton - 2009

3 papers in library cite

Cited by hinton in the paper w/ google, ibm, microsoft

[9]Auto-Encoder Bottleneck Features Using Deep Belief Networks

T. N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran - 2012

3 papers in library cite

[10]Fast Inference in Sparse Coding Algorithms With Applications to Object Recognition

Koray Kavukcuoglu, Marc'aurelio Ranzato, Yann Lecun - 2008

3 papers in library cite

[11]Learning Fast Approximations of Sparse Coding

K. Gregor, Yann Lecun - 2010

3 papers in library cite

[12]Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-Free Optimization

Brian Kingsbury, T. N. Sainath, H. Soltau - 2012

3 papers in library cite

[13]Improved Pre-Training of Deep Belief Networks Using Sparse Encoding Symmetric Machines

C. Plahl, T. N. Sainath, Bhuvana Ramabhadran, D. Nahamoo - 2012

2 papers in library cite

[14]Parallel Training of Deep Stacking Networks

L. Deng, B. Hutchinson, D. Yu - 2012

1 paper in library cites

[15]Sparse Coding via Thresholding and Local Competition in Neural Circuits

C. J. Rozell, D. H. Johnson, R. G. Baraniuk, Bruno A. Olshausen - 2008

1 paper in library cites

[16]Unsupervised Learning of Feature Hierarchies

Marc'aurelio Ranzato - 2009

1 paper in library cites

Cited by

3

papers in your library

Cites

8

papers in your library

Read

on August 3, 2025

It's boring and the only thing it brings to the table is usage of ReLU in speech recog (which is unsurprising since it was already used everywhere else)

Tags

Paper Aliases

No aliases