2011

Generating Text With Recurrent Neural Networks

Ilya Sutskever, James Martens, Geoffrey E. Hinton

citations

Cite Score

56

AI summary

This paper introduces a novel Multiplicative Recurrent Neural Network (MRNN) architecture for character-level language modeling, trained using Hessian-Free optimization, achieving state-of-the-art results surpassing previous methods on benchmark datasets. The MRNN demonstrates strong language generation capabilities.

Main Contributions

  • Introduces a new RNN variant: Multiplicative RNN (MRNN) that uses multiplicative (or 'gated') connections.
  • Demonstrates the power of RNNs trained with Hessian-Free optimization for character-level language modeling tasks.
  • Achieves state-of-the-art results surpassing the performance of the best previous single method for character-level language modeling: a hierarchical non-parametric sequence model.
  • Largest recurrent neural network application to date.
  • The text generated by the MRNNs exhibited a significant amount of interesting and high-level linguistic structure, featuring a large vocabulary, a considerable amount of grammatical structure, and a wide variety of highly plausible proper names that were not in the training set.

Abstract

Recurrent Neural Networks (RNNs) are very powerful sequence models that do not enjoy widespread use because it is extremely difficult to train them properly. Fortunately, recent advances in Hessian-free optimization have been able to overcome the difficulties associated with training RNNs, making it possible to apply them successfully to challenging sequence problems. In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. The standard RNN architecture, while effective, is not ideally suited for such tasks, so we introduce a new RNN variant that uses multiplicative (or "gated") connections which allow the current input character to determine the transition matrix from one hidden state vector to the next. After training the multiplicative RNN with the HF optimizer for five days on 8 high-end Graphics Processing Units, we were able to surpass the performance of the best previous single method for character-level language modeling: a hierarchical non-parametric sequence model. To our knowledge this represents the largest recurrent neural network application to date.

Citation Graph

Loading graph...

References [26]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

P. Werbos - 1990

9 papers in library cite

Herbert Jaeger, Harald Haas - 2004

4 papers in library cite

Alex Graves, Jürgen Schmidhuber - 2009

5 papers in library cite

James Martens - 2010

12 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

James Martens, Ilya Sutskever - 2011

13 papers in library cite

A. Robinson - 1994

9 papers in library cite

Sepp Hochreiter - 1991

18 papers in library cite

V. Mnih - 2009

5 papers in library cite

M. Hutter - 2012

4 papers in library cite

Graham W. Taylor, Geoffrey E. Hinton - 2009

3 papers in library cite

E. Sandhaus - 2008

3 papers in library cite

M. Mahoney - 2005

2 papers in library cite

Kevin P. Murphy - 2002

2 papers in library cite

R. M. Bell, Y. Koren, C. Volinsky - 2007

2 papers in library cite

F. Wood, C. Archambeau, J. Gasthaus, L. James, Yee Whye Teh - 2009

1 paper in library cites

J. Rissanen, G. G. Langdon - 1979

1 paper in library cites

D. J. Ward, A. F. Blackwell, D. J. C. Mackay - 2000

1 paper in library cites

T. Tieleman - 2010

1 paper in library cites

G. Pollastri, D. Przybylski, B. Rost, P. Baldi - 2002

1 paper in library cites

J. Gasthaus, F. Wood, Yee Whye Teh - 2010

1 paper in library cites

Herbert Jaeger - 2000

1 paper in library cites

Cited by

13

papers in your library

Cites

11

papers in your library

Read

on June 21, 2025

Your review

Tags

Paper Aliases

No aliases