2014

Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le

citations

Cite Score

94

AI summary

This paper introduces a sequence-to-sequence learning approach using multilayered LSTMs for machine translation, achieving a BLEU score of 34.8 on the WMT'14 English to French translation task. Reversing the order of words in source sentences improves performance, and the LSTM model also learns meaningful sentence representations.

Main Contributions

  • Introduces a general end-to-end approach to sequence learning using LSTMs.
  • Achieves a BLEU score of 34.8 on the WMT'14 English to French translation task.
  • Demonstrates that reversing the order of words in source sentences improves LSTM performance.
  • Shows that LSTMs can learn sensible phrase and sentence representations.
  • Finds that deep LSTMs outperform shallow LSTMs.

Abstract

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Citation Graph

Loading graph...

References [31]

Sort:
Filter:

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002

19 papers in library cite

Geoffrey Hinton - 2012

21 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Alex Graves, Santiago Fernandez, Faustino Gomez, Jürgen Schmidhuber - 2006

7 papers in library cite

P. Werbos - 1990

9 papers in library cite

Dan C. Ciresan, Ueli Meier, Jürgen Schmidhuber - 2012

11 papers in library cite

Alex Graves - 2013

27 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng - 2012

10 papers in library cite

M. Sundermeyer, R. Schluter, Hermann Ney - 2010

7 papers in library cite

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001

16 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

5 papers in library cite

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, John Makhoul - 2014

9 papers in library cite

K. M. Hermann, Phil Blunsom - 2014

3 papers in library cite

Sepp Hochreiter - 1991

18 papers in library cite

Tomas Mikolov - 2012

17 papers in library cite

N. Durrani, B. Haddow, P. Koehn, K. Heafield - 2014

6 papers in library cite

Michael Auli, M. Galley, C. Quirk, Geoffrey Zweig - 2013

3 papers in library cite

J. P. Abadie, D. Bahdanau, B. V. Merrienboer, Kyunghyun Cho, Yoshua Bengio - 2014

2 papers in library cite

Holger Schwenk - 2014

2 papers in library cite

A. Razborov - 1992

1 paper in library cites

Cited by

58

papers in your library

Cites

24

papers in your library

Read

on June 20, 2025

Your review

Tags

Paper Aliases

No aliases