Papperoni

2016

Neural GPUs Learn Algorithms

Lukasz Kaiser, Ilya Sutskever

Open PDF Google Scholar

citations

Cite Score

20

AI summary

This paper introduces the Neural GPU, a parallel neural network architecture based on convolutional gated recurrent units, and demonstrates its ability to learn and generalize algorithmic tasks such as long binary addition and multiplication, achieving perfect accuracy on inputs much longer than those seen during training.

Main Contributions

Introduces the Neural GPU architecture, a parallel and trainable neural network for learning algorithms.
Demonstrates the Neural GPU's ability to learn long binary multiplication, a superlinear-time algorithm.
Introduces parameter sharing relaxation, a technique for training deep recurrent networks.
Achieves perfect generalization on algorithmic tasks, such as addition and multiplication, even for inputs much longer than training data.
Shows that dropout and gradient noise have a positive impact on learning and generalization in Neural GPUs.

Abstract

Learning an algorithm from examples is a fundamental problem that has been widely studied. It has been addressed using neural networks too, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential nature: they are not parallel and are are hard to train due to their large depth when unfolded. We present a neural network architecture to address this problem: the Neural GPU. It is based on a type of convolutional gated recurrent unit and, like the NTM, is computationally universal. Unlike the NTM, the Neural GPU is highly parallel which makes it easier to train and efficient to run. An essential property of algorithms is their ability to handle inputs of arbitrary size. We show that the Neural GPU can be trained on short instances of an algorithmic task and successfully generalize to long instances. We verified it on a number of tasks including long addition and long multiplication of numbers represented in binary. We train the Neural GPU on numbers with up-to 20 bits and observe no errors whatsoever while testing it, even on much longer numbers. To achieve these results we introduce a technique for training deep recurrent networks: parameter sharing relaxation. We also found a small amount of dropout and gradient noise to have a large positive effect on learning and generalization.

Citation Graph

Loading graph...

References [31]

Sort:

Filter:

[1]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[2]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[3]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[4]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[5]Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Introduces RNN encoder-decoder. I love it :)

[6]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[7]Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014

11 papers in library cite

It's a good paper but results are veeeery underwhelming.

[8]Show and Tell: A Neural Image Caption Generator

Dumitru Erhan - 2015

11 papers in library cite

It's nice and they beat a ton of SotA. However, I read the one that uses attention first so this is a bit less surprising.

[9]LSTM: A Search Space Odyssey

K. Greff, R. K. Srivastava, J. Koutn'ik, B. R. Steunebrink, Jürgen Schmidhuber - 2015

4 papers in library cite

Very good review on different architectural choices of LSTM -  actually brings some nice insights (as opposed to the other paper that compares GRU and LSTMs)

[10]Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Good paper, very well written and probably the best explanation of RBMs and DBNs I've seen. However, I don't see a lot of impact and seems very derivative from other works.

[11]Neural Turing Machines

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

This paper is amazing. If someone told me that NNs could use and address memory by position I wouldn't believe it worked. Very nice, but it's a shame that it's just a toy example.

[12]Recurrent Continuous Translation Models

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Good paper, probably the first that used an encoder-decoder. But they used a conv. NN instead of a tradicional decoder, which I don't really like.

[13]Grammar as a Foreign Language

Geoffrey Hinton - 2015

9 papers in library cite

It's a nice paper showing that attention can be used for parsing. However, parsing is boring and is very derivative. Good paper nonetheless.

[14]Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets

Armand Joulin, Tomas Mikolov - 2015

9 papers in library cite

Very underwhelming TBH. I expected more after reading the Neural Turing Machine paper. This reads like "yeah, we lost the race, here's what we were doing before they did something better"

[15]Highway Networks

R. K. Srivastava, K. Greff, Jürgen Schmidhuber - 2015

6 papers in library cite

Introduced highway networks, which seem like a precursor to resnets

[16]Fast Algorithms for Convolutional Neural Networks

A. Lavin - 2015

3 papers in library cite

Optimization trick

[17]Dropout Improves Recurrent Neural Networks for Handwriting Recognition

V. Pham, T. Bluche, C. Kermorvant, J. Louradour - 2014

5 papers in library cite

Dropout for RNNs

[18]Learning to Execute

Wojciech Zaremba, Ilya Sutskever - 2014

8 papers in library cite

They try to execute python code

[19]Listen, Attend and Spell

W. Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals - 2015

4 papers in library cite

Speech recog + attention

[20]Grid Long Short-Term Memory

N. Kalchbrenner, Ivo Danihelka, Alex Graves - 2016

3 papers in library cite

[21]Learning to Transduce With Unbounded Memory

Edward Grefenstette, K. Hermann, M. Suleyman, Phil Blunsom - 2015

5 papers in library cite

[22]Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

X. Shi, Ziru Chen, Haiming Wang, D. Y. Yeung, W. K. Wong, W. C. Woo - 2015

2 papers in library cite

[23]Inductive Programming: A Survey of Program Synthesis Techniques

E. Kitzelmann - 2010

2 papers in library cite

[24]Variable Rate Image Compression With Recurrent Neural Networks

G. Toderici, S. M. O'malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, R. Sukthankar - 2016

2 papers in library cite

[25]An Introduction to Cellular Automata

H. Vivien - 2003

1 paper in library cites

[26]Automatic Structures

A. Blumensath, E. Gradel - 2000

1 paper in library cites

[27]Bayesian Learning via Stochastic gradient Langevin Dynamics

M. Welling, Yee Whye Teh - 2011

1 paper in library cites

[28]Dimensions in Program Synthesis

S. Gulwani - 2010

1 paper in library cites

[29]Learning Games From Videos Guided by Descriptive Complexity

Lukasz Kaiser - 2012

1 paper in library cites

[30]Learning Regaular Sets From Queries and Counterexamples

D. Angluin - 1987

1 paper in library cites

[31]Reinforcement Learning Neural Turing Machines

Wojciech Zaremba, Ilya Sutskever - 2015

1 paper in library cites

Cited by

5

papers in your library

Cites

20

papers in your library

Read

on October 17, 2025

Results and architecture are nice but TBH it is very poorly written... Doesn't seem like a lot of impact either.

Tags

Paper Aliases

No aliases