2014

Recurrent Models of Visual Attention

V. Mnih, N. Heess, Alex Graves

citations

Cite Score

74

AI summary

This paper introduces a recurrent neural network (RNN) model, the Recurrent Attention Model (RAM), that uses reinforcement learning to extract information from images by adaptively selecting a sequence of regions, outperforming a convolutional neural network baseline on cluttered image classification tasks and learning to track a simple object on a dynamic visual control problem.

Main Contributions

  • Introduces a novel recurrent neural network (RNN) model for visual attention, called RAM.
  • The model processes inputs sequentially, attending to different locations within the images (or video frames) one at a time.
  • The amount of computation it performs can be controlled independently of the input image size.
  • An end-to-end optimization procedure allows the model to be trained directly with respect to a given task and to maximize a performance measure which may depend on the entire sequence of decisions made by the model.
  • Demonstrates that RAM significantly outperforms a convolutional neural network baseline on cluttered image classification tasks and on a dynamic visual control problem.

Abstract

Applying convolutional neural networks to large images is computationally ex-pensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is ca-pable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it per-forms can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Citation Graph

Loading graph...

References [26]

Sort:
Filter:

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

James Bergstra, Yoshua Bengio - 2012

7 papers in library cite

R. Williams - 1992

11 papers in library cite

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

Marc'aurelio Ranzato - 2014

3 papers in library cite

P. Viola, M. J. Jones - 2001

10 papers in library cite

Hugo Larochelle, Geoffrey E. Hinton - 2010

4 papers in library cite

M. Denil, L. Bazzani, Hugo Larochelle, N. D. Freitas - 2012

4 papers in library cite

K. E. A. V. D. Sande, J. R. R. Uijlings, T. Gevers, A. W. M. Smeulders - 2011

3 papers in library cite

C. H. Lampert, M. B. Blaschko, T. Hofmann - 2008

2 papers in library cite

Richard S. Sutton, D. Mcallester, Shivalika Singh, Y. Mansour - 2000

2 papers in library cite

B. Alexe, N. Heess, Yee Whye Teh, V. Ferrari - 2012

2 papers in library cite

R. A. Rensink - 2000

2 papers in library cite

B. Alexe, T. Deselaers, V. Ferrari - 2010

2 papers in library cite

L. Itti, C. Koch, E. Niebur - 1998

1 paper in library cites

S. Mathe, C. Sminchisescu - 2013

1 paper in library cites

P. F. Felzenszwalb, R. B. Girshick, D. A. Mcallester - 2010

1 paper in library cites

Antonio Torralba, Aude Oliva, M. S. Castelhano, J. M. Henderson - 2006

1 paper in library cites

K. O. Stanley, R. Miikkulainen - 2004

1 paper in library cites

M. Hayhoe, D. Ballard - 2005

1 paper in library cites

N. J. Butko, J. R. Movellan - 2008

1 paper in library cites

N. J. Butko, J. R. Movellan - 2009

1 paper in library cites

L. Paletta, G. Fritz, C. Seifert - 2005

1 paper in library cites

Daan Wierstra, A. Foerster, J. Peters, Jürgen Schmidhuber - 2007

1 paper in library cites

Cited by

5

papers in your library

Cites

7

papers in your library

Read

on August 2, 2025

Your review

Tags

Paper Aliases

No aliases