Papperoni

2014

Recurrent Models of Visual Attention

V. Mnih, N. Heess, Alex Graves

Open PDF Google Scholar

citations

Cite Score

74

AI summary

This paper introduces a recurrent neural network (RNN) model, the Recurrent Attention Model (RAM), that uses reinforcement learning to extract information from images by adaptively selecting a sequence of regions, outperforming a convolutional neural network baseline on cluttered image classification tasks and learning to track a simple object on a dynamic visual control problem.

Main Contributions

Introduces a novel recurrent neural network (RNN) model for visual attention, called RAM.
The model processes inputs sequentially, attending to different locations within the images (or video frames) one at a time.
The amount of computation it performs can be controlled independently of the input image size.
An end-to-end optimization procedure allows the model to be trained directly with respect to a given task and to maximize a performance measure which may depend on the entire sequence of decisions made by the model.
Demonstrates that RAM significantly outperforms a convolutional neural network baseline on cluttered image classification tasks and on a dynamic visual control problem.

Abstract

Applying convolutional neural networks to large images is computationally ex-pensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is ca-pable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it per-forms can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Citation Graph

Loading graph...

References [26]

Sort:

Filter:

[1]ImageNet Classification With Deep Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

I'm giving this a 5 just because of the impact, but this is VEEERY derivative of earlier work. Kudos for them for putting it all together, but really there's nothing revolutionary here.

[2]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[3]Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

Good results, beat overfeat, used pretraining for improving performance. Only issue is that the paper is overly long...

[4]Random Search for Hyper-Parameter Optimization

James Bergstra, Yoshua Bengio - 2012

7 papers in library cite

It seems crazy that it was only in 2012 that they found out that random search was good! Still, kudos on them for noticing, and the paper is just so easy to follow and enjoyable!

[5]Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

R. Williams - 1992

11 papers in library cite

It's alright for formalizing the concept, but it's a bit boring and doesn't add a lot from the middle on. Focuses too much in reviewing existing techniques and in stochastic units.

[6]OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

Very convoluted method, was SotA for only a bit of time, and the paper is very boring.

[7]On Learning Where to Look

Marc'aurelio Ranzato - 2014

3 papers in library cite

Learning where to look with glimpses

[8]Rapid Object Detection Using a Boosted Cascade of Simple Features

P. Viola, M. J. Jones - 2001

10 papers in library cite

[9]Learning to Combine Foveal Glimpses With a Third-Order Boltzmann Machine

Hugo Larochelle, Geoffrey E. Hinton - 2010

4 papers in library cite

[10]Learning Where to Attend With Deep Architectures for Image Tracking

M. Denil, L. Bazzani, Hugo Larochelle, N. D. Freitas - 2012

4 papers in library cite

[11]Segmentation as Selective Search for Object Recognition

K. E. A. V. D. Sande, J. R. R. Uijlings, T. Gevers, A. W. M. Smeulders - 2011

3 papers in library cite

[12]Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

C. H. Lampert, M. B. Blaschko, T. Hofmann - 2008

2 papers in library cite

[13]Policy Gradient Methods for Reinforcement Learning With Function Approximation

Richard S. Sutton, D. Mcallester, Shivalika Singh, Y. Mansour - 2000

2 papers in library cite

[14]Searching for Objects Driven by Context

B. Alexe, N. Heess, Yee Whye Teh, V. Ferrari - 2012

2 papers in library cite

[15]The Dynamic Representation of Scenes

R. A. Rensink - 2000

2 papers in library cite

[16]What Is an Object

B. Alexe, T. Deselaers, V. Ferrari - 2010

2 papers in library cite

[17]A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

L. Itti, C. Koch, E. Niebur - 1998

1 paper in library cites

[18]Action From Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

S. Mathe, C. Sminchisescu - 2013

1 paper in library cites

[19]Cascade Object Detection With Deformable Part Models

P. F. Felzenszwalb, R. B. Girshick, D. A. Mcallester - 2010

1 paper in library cites

[20]Contextual Guidance of Eye Movements and Attention in Real-World Scenes: The Role of Global Features in Object Search

Antonio Torralba, Aude Oliva, M. S. Castelhano, J. M. Henderson - 2006

1 paper in library cites

[21]Evolving a Roving Eye for Go

K. O. Stanley, R. Miikkulainen - 2004

1 paper in library cites

[22]Eye Movements in Natural Behavior

M. Hayhoe, D. Ballard - 2005

1 paper in library cites

[23]I-Pomdp: An Infomax Model of Eye Movement

N. J. Butko, J. R. Movellan - 2008

1 paper in library cites

[24]Optimal Scanning for Faster Object Detection

N. J. Butko, J. R. Movellan - 2009

1 paper in library cites

[25]Q-Learning of Sequential Attention for Visual Object Recognition From Informative Local Descriptors

L. Paletta, G. Fritz, C. Seifert - 2005

1 paper in library cites

[26]Solving Deep Memory Pomdps With Recurrent Policy Gradients

Daan Wierstra, A. Foerster, J. Peters, Jürgen Schmidhuber - 2007

1 paper in library cites

Cited by

5

papers in your library

Cites

7

papers in your library

Read

on August 2, 2025

It's not as good as the other paper (DRAW), but it's a precursor and it's so nice how the model learns to pay attention. Also very nice to see RL in the mix, and see the possible usages in games and other things.

Tags

Paper Aliases

No aliases