2016

Neural GPUs Learn Algorithms

Lukasz Kaiser, Ilya Sutskever

citations

Cite Score

20

AI summary

This paper introduces the Neural GPU, a parallel neural network architecture based on convolutional gated recurrent units, and demonstrates its ability to learn and generalize algorithmic tasks such as long binary addition and multiplication, achieving perfect accuracy on inputs much longer than those seen during training.

Main Contributions

  • Introduces the Neural GPU architecture, a parallel and trainable neural network for learning algorithms.
  • Demonstrates the Neural GPU's ability to learn long binary multiplication, a superlinear-time algorithm.
  • Introduces parameter sharing relaxation, a technique for training deep recurrent networks.
  • Achieves perfect generalization on algorithmic tasks, such as addition and multiplication, even for inputs much longer than training data.
  • Shows that dropout and gradient noise have a positive impact on learning and generalization in Neural GPUs.

Abstract

Learning an algorithm from examples is a fundamental problem that has been widely studied. It has been addressed using neural networks too, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential nature: they are not parallel and are are hard to train due to their large depth when unfolded. We present a neural network architecture to address this problem: the Neural GPU. It is based on a type of convolutional gated recurrent unit and, like the NTM, is computationally universal. Unlike the NTM, the Neural GPU is highly parallel which makes it easier to train and efficient to run. An essential property of algorithms is their ability to handle inputs of arbitrary size. We show that the Neural GPU can be trained on short instances of an algorithmic task and successfully generalize to long instances. We verified it on a number of tasks including long addition and long multiplication of numbers represented in binary. We train the Neural GPU on numbers with up-to 20 bits and observe no errors whatsoever while testing it, even on much longer numbers. To achieve these results we introduce a technique for training deep recurrent networks: parameter sharing relaxation. We also found a small amount of dropout and gradient noise to have a large positive effect on learning and generalization.

Citation Graph

Loading graph...

References [31]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014

11 papers in library cite

Dumitru Erhan - 2015

11 papers in library cite

K. Greff, R. K. Srivastava, J. Koutn'ik, B. R. Steunebrink, Jürgen Schmidhuber - 2015

4 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Geoffrey Hinton - 2015

9 papers in library cite

Armand Joulin, Tomas Mikolov - 2015

9 papers in library cite

R. K. Srivastava, K. Greff, Jürgen Schmidhuber - 2015

6 papers in library cite

A. Lavin - 2015

3 papers in library cite

V. Pham, T. Bluche, C. Kermorvant, J. Louradour - 2014

5 papers in library cite

Wojciech Zaremba, Ilya Sutskever - 2014

8 papers in library cite

W. Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals - 2015

4 papers in library cite

N. Kalchbrenner, Ivo Danihelka, Alex Graves - 2016

3 papers in library cite

Edward Grefenstette, K. Hermann, M. Suleyman, Phil Blunsom - 2015

5 papers in library cite

X. Shi, Ziru Chen, Haiming Wang, D. Y. Yeung, W. K. Wong, W. C. Woo - 2015

2 papers in library cite

E. Kitzelmann - 2010

2 papers in library cite

G. Toderici, S. M. O'malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, R. Sukthankar - 2016

2 papers in library cite

H. Vivien - 2003

1 paper in library cites

A. Blumensath, E. Gradel - 2000

1 paper in library cites

M. Welling, Yee Whye Teh - 2011

1 paper in library cites

S. Gulwani - 2010

1 paper in library cites

Lukasz Kaiser - 2012

1 paper in library cites

D. Angluin - 1987

1 paper in library cites

Wojciech Zaremba, Ilya Sutskever - 2015

1 paper in library cites

Cited by

5

papers in your library

Cites

20

papers in your library

Read

on October 17, 2025

Your review

Tags

Paper Aliases

No aliases