Papperoni

2012

Sequence Transduction With Recurrent Neural Networks

Alex Graves

Open PDF Google Scholar

citations

Cite Score

58

AI summary

This paper introduces an end-to-end probabilistic sequence transduction system using recurrent neural networks, capable of transforming any input sequence into a finite, discrete output sequence. It demonstrates strong performance on the TIMIT speech corpus for phoneme recognition, achieving a 23.2% error rate.

Main Contributions

Introduces an end-to-end, probabilistic sequence transduction system based on RNNs.
The system is able to transform any input sequence into any finite, discrete output sequence.
The method jointly models input-output and output-output dependencies.
Achieves state-of-the-art results on TIMIT phoneme recognition using a recurrent neural network, with a 23.2% error rate.
The system integrates acoustic and linguistic information during a speech recognition task.

Abstract

Many machine learning tasks can be expressed as the transformation--or transduction of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since finding the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

Citation Graph

Loading graph...

References [19]

Sort:

Filter:

[1]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[2]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

I absolutely hated this paper. Has ~50 pages but seems like 200 pages. Takes too long to explain some things that really is just repeating itself. Also doesn't seem to add too much on top of LeNet-5. Also, focuses a lot on GTNs, which really didn't stick.

[3]Bidirectional Recurrent Neural Networks

M. Schuster, Kuldip K. Paliwal - 1997

10 papers in library cite

Introduced the BRNN concept. Good paper.

[4]Recurrent Neural Network Based Language Model

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

The comeback of RNNs for language modeling. Not too exciting but impactful and a short read.

[5]Connectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural Networks

Alex Graves, Santiago Fernandez, Faustino Gomez, Jürgen Schmidhuber - 2006

7 papers in library cite

It's a bit lukewarm. Nice idea but execution was a bit meh. I also think the prefix search was unnecessarily complex and loses to beam search (as they admit later on)

[6]Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001

16 papers in library cite

Wow, this is so much better than the other paper - I should have read it sooner. It's concise and not too abstract, and also gives very good context on RNN problems and how to solve them.

[7]Generating Text With Recurrent Neural Networks

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

Pleasant paper but results are underwhelming. They use RNNs for character-level modeling, which is different. They also use the hessian-free method proposed by Martens, but don't go too deep into how it works, which is nice because otherwise it would be very mathy. Other papers cite this more as an example of usage rather than an actual milestone.

[8]Offline Handwriting Recognition With Multidimensional Recurrent Neural Networks

Alex Graves, Jürgen Schmidhuber - 2009

5 papers in library cite

It's okay. The method is nice but a bit too convoluted.

[9]Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity

Ronald J. Williams, David Zipser - 1992

8 papers in library cite

Oof, this was a hard read. I had to force myself to finish. I think this is too theoretical and deals with RNNs that are not really used nowadays. I skimmed through the second half.

[10]Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures

Alex Graves, Jürgen Schmidhuber - 2005

14 papers in library cite

Very nice paper! Simple, no bullshit. Just "hey, we have LSTM and we have BRNN, let's try to join it"

[11]Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

J. Lafferty, Andrew Mccallum, F. C. Pereira - 2001

6 papers in library cite

[12]Phone Recognition With the Mean-Covariance Restricted Boltzmann Machine

George E. Dahl, Marc'aurelio Ranzato, A. Mohamed, Geoffrey E. Hinton - 2010

6 papers in library cite

[13]Speaker-Independent Phone Recognition Using Hidden Markov Models

K. F. Lee, H. W. Hon - 1989

5 papers in library cite

[14]The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)

D. Isto - 1990

5 papers in library cite

[15]Unconstrained Online Handwriting Recognition With Recurrent Neural Networks

Alex Graves, Santiago Fernandez, M. Liwicki, H. Bunke, Jürgen Schmidhuber - 2008

5 papers in library cite

[16]An Analysis of Noise in Recurrent Neural Networks: Convergence and Generalization

K. C. Jim, C. L. Giles, B. G. Horne - 1996

4 papers in library cite

[17]Global Training of Document Processing Systems Using Graph Transformer Networks

Leon Bottou, Yoshua Bengio, Yann Lecun - 1997

2 papers in library cite

[18]Long Short-Term Memory in Recurrent Neural Networks

F. Gers - 2001

1 paper in library cites

[19]Rejection Strategies for Offline Handwritten Text Line Recognition

R. Bertolami, M. Zimmermann, H. Bunke - 2006

1 paper in library cites

Cited by

7

papers in your library

Cites

10

papers in your library

Read

on April 29, 2025

Good contribution. Discusses transducing (converting one sequence to the other) without pre-defined alignment. I didn't really like it as it is too mathy and a bit hard to understand, and I think it was not too impactful.

Tags

Paper Aliases

No aliases