2016

Can Active Memory Replace Attention?

Lukasz Kaiser, Samy Bengio

citations

Cite Score

5

AI summary

This paper introduces an extension of the Neural GPU model with active memory, achieving good results for neural machine translation. The model clarifies the relationship between attention and active memory and demonstrates its effectiveness, particularly on longer sentences, outperforming traditional attention mechanisms.

Main Contributions

  • The paper introduces an extension of the Neural GPU model for neural machine translation.
  • It provides insights into the relationship between attention mechanisms and active memory models.
  • The model demonstrates competitive performance in machine translation tasks compared to attention models.
  • It shows that active memory models are less sensitive to sentence length than attention-based models.
  • The paper highlights the importance of recurrent dependence in generating output distributions for improved performance.

Abstract

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably the largest impact on neural machine translation. Recently, similar improvements have been obtained using alternative mechanisms that do not focus on a single part of a memory but operate on all of it in parallel, in a uniform way. Such mechanism, which we call active memory, improved over attention in algorithmic tasks, image processing, and in generative modelling. So far, however, active memory has not improved over attention for most natural language processing tasks, in particular for machine translation. We analyze this shortcoming in this paper and propose an extended model of active memory that matches existing attention models on neural machine translation and generalizes better to longer sentences. We investigate this model and explain why previous active memory models did not succeed. Finally, we discuss when active memory brings most benefits and where attention can be a better choice.

Citation Graph

Loading graph...

References [27]

Sort:
Filter:

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

R. Williams - 1992

11 papers in library cite

K. Xu, Jimmy Lei Ba, R. Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, R. Zemel, Yoshua Bengio - 2015

12 papers in library cite

M. Abadi, Akshat Agarwal, P. Barham, E. Brevdo, Ziru Chen, C. Citro, G. Corrado, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, I. Goodfellow, A. Harp, Geoffrey Irving, M. Isard, Y. Jia, R. Jozefowicz, Lukasz Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, Christopher Olah, M. Schuster, J. Shlens, B. Steiner, Ilya Sutskever, K. Talwar, P. Tucker, Vincent Vanhoucke, V. Vasudevan, F. Viegas, Oriol Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2015

11 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, D. Bahdanau, Yoshua Bengio - 2014

9 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

K. Gregor, Ivo Danihelka, Alex Graves, D. J. Rezende, Daan Wierstra - 2015

5 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Geoffrey Hinton - 2015

9 papers in library cite

Armand Joulin, Tomas Mikolov - 2015

9 papers in library cite

Lukasz Kaiser, Ilya Sutskever - 2016

5 papers in library cite

A. Lavin - 2015

3 papers in library cite

Zhuowen Tu, Z. L. Lu, Yibo Liu, Xiaodong Liu, H. Li - 2016

4 papers in library cite

N. Kalchbrenner, Ivo Danihelka, Alex Graves - 2016

3 papers in library cite

D. J. Rezende, S. Mohamed, Ivo Danihelka, K. Gregor, Daan Wierstra - 2016

2 papers in library cite

K. Gregor, F. Besse, D. J. Rezende, Ivo Danihelka, Daan Wierstra - 2016

1 paper in library cites

Fanqing Meng, Z. L. Lu, Mingliang Wang, H. Li, W. Jiang, Qian Liu - 2015

3 papers in library cite

Q. Liao, T. Poggio - 2016

2 papers in library cite

X. Shi, Ziru Chen, Haiming Wang, D. Y. Yeung, W. K. Wong, W. C. Woo - 2015

2 papers in library cite

G. Toderici, S. M. O'malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, R. Sukthankar - 2016

2 papers in library cite

Cited by

2

papers in your library

Cites

23

papers in your library

Read

on August 4, 2025

Your review

Tags

Paper Aliases

No aliases