2015

Show, Attend and Tell: Neural Image Caption Generation With Visual Attention

K. Xu, Jimmy Lei Ba, R. Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, R. Zemel, Yoshua Bengio

citations

Cite Score

88

AI summary

This paper introduces an attention-based model that learns to describe the content of images, using convolutional and recurrent neural networks. It uses Flickr8k, Flickr30k and MS COCO datasets. The model achieves state-of-the-art performance on three benchmark datasets.

Main Contributions

  • Introduces two attention-based image caption generators under a common framework: a deterministic attention mechanism and a stochastic attention mechanism.
  • Shows how to gain insight and interpret the results of this framework by visualizing where and what the attention focused on.
  • Validates the usefulness of attention in caption generation with state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Abstract

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Citation Graph

Loading graph...

References [46]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Christian Szegedy, Weizhou Liu, Y. Jia, P. Sermanet, S. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich - 2015

20 papers in library cite

T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014

14 papers in library cite

N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014

20 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

R. Williams - 1992

11 papers in library cite

Dumitru Erhan - 2015

11 papers in library cite

V. Mnih, N. Heess, Alex Graves - 2014

5 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Alon Lavie - 2014

2 papers in library cite

Richard S. Zemel - 2014

5 papers in library cite

F. Bastien, P. Lamblin, Razvan Pascanu, James Bergstra, I. Goodfellow, A. Bergeron, A. Bouchard, N. Nicolas, Yoshua Bengio - 2012

13 papers in library cite

Razvan Pascanu, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2013

7 papers in library cite

Ruslan Salakhutdinov - 2014

2 papers in library cite

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, Trevor Darrell - 2014

4 papers in library cite

A. Karpathy, Li Fei Fei - 2014

6 papers in library cite

M. Hodosh, P. Young, J. Hockenmaier - 2013

4 papers in library cite

H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, Piotr Dollar, Jianfeng Gao, X. He, M. Mitchell, J. Platt - 2014

1 paper in library cites

Jimmy Lei Ba, V. Mnih, Koray Kavukcuoglu - 2014

4 papers in library cite

R. Kiros, Richard S. Zemel, Ruslan Salakhutdinov - 2014

3 papers in library cite

J. Snoek, Hugo Larochelle, R. P. Adams - 2012

9 papers in library cite

P. Young, A. L. M. Hodosh, J. Hockenmaier - 2014

5 papers in library cite

G. Kulkarni, V. Premraj, S. Dhar, Shanda Li, Yejin Choi, A. C. Berg, T. L. Berg - 2011

4 papers in library cite

Hugo Larochelle, Geoffrey E. Hinton - 2010

4 papers in library cite

M. Denil, L. Bazzani, Hugo Larochelle, N. D. Freitas - 2012

4 papers in library cite

P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, Yejin Choi - 2012

3 papers in library cite

Shanda Li, G. Kulkarni, T. L. Berg, A. C. Berg, Yejin Choi - 2011

3 papers in library cite

J. Mao, Weixin Xu, Yining Yang, J. Wang, A. Yuille - 2014

3 papers in library cite

M. Mitchell, X. Han, J. Dodge, A. Mensch, A. G. A. P. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, H. D. Iii - 2012

3 papers in library cite

P. Kuznetsova, V. Ordonez, T. L. Berg, Yejin Choi - 2014

3 papers in library cite

Yining Yang, C. L. Teo, H. D. Iii, Y. Aloimonos - 2011

2 papers in library cite

D. Elliott, F. Keller - 2013

2 papers in library cite

R. A. Rensink - 2000

2 papers in library cite

M. Corbetta, G. Shulman - 2002

1 paper in library cites

J. Snoek, K. Swersky, R. Zemel, R. A. Adams - 2014

1 paper in library cites

X. Chen, C. Zitnick - 2014

1 paper in library cites

T. Tieleman, Geoffrey Hinton - 2012

1 paper in library cites

P. Baldi, P. Sadowski - 2014

1 paper in library cites

L. Weaver, N. Tao - 2001

1 paper in library cites

Cited by

12

papers in your library

Cites

27

papers in your library

Read

on August 3, 2025

Your review

Tags

Paper Aliases

No aliases