2012

Sequence Transduction With Recurrent Neural Networks

Alex Graves

citations

Cite Score

58

AI summary

This paper introduces an end-to-end probabilistic sequence transduction system using recurrent neural networks, capable of transforming any input sequence into a finite, discrete output sequence. It demonstrates strong performance on the TIMIT speech corpus for phoneme recognition, achieving a 23.2% error rate.

Main Contributions

  • Introduces an end-to-end, probabilistic sequence transduction system based on RNNs.
  • The system is able to transform any input sequence into any finite, discrete output sequence.
  • The method jointly models input-output and output-output dependencies.
  • Achieves state-of-the-art results on TIMIT phoneme recognition using a recurrent neural network, with a 23.2% error rate.
  • The system integrates acoustic and linguistic information during a speech recognition task.

Abstract

Many machine learning tasks can be expressed as the transformation--or transduction of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since finding the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

Citation Graph

Loading graph...

References [19]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

M. Schuster, Kuldip K. Paliwal - 1997

10 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Alex Graves, Santiago Fernandez, Faustino Gomez, Jürgen Schmidhuber - 2006

7 papers in library cite

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001

16 papers in library cite

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

Alex Graves, Jürgen Schmidhuber - 2009

5 papers in library cite

Ronald J. Williams, David Zipser - 1992

8 papers in library cite

Alex Graves, Jürgen Schmidhuber - 2005

14 papers in library cite

J. Lafferty, Andrew Mccallum, F. C. Pereira - 2001

6 papers in library cite

George E. Dahl, Marc'aurelio Ranzato, A. Mohamed, Geoffrey E. Hinton - 2010

6 papers in library cite

K. F. Lee, H. W. Hon - 1989

5 papers in library cite

D. Isto - 1990

5 papers in library cite

Alex Graves, Santiago Fernandez, M. Liwicki, H. Bunke, Jürgen Schmidhuber - 2008

5 papers in library cite

K. C. Jim, C. L. Giles, B. G. Horne - 1996

4 papers in library cite

Leon Bottou, Yoshua Bengio, Yann Lecun - 1997

2 papers in library cite

F. Gers - 2001

1 paper in library cites

R. Bertolami, M. Zimmermann, H. Bunke - 2006

1 paper in library cites

Cited by

7

papers in your library

Cites

10

papers in your library

Read

on April 29, 2025

Your review

Tags

Paper Aliases

No aliases