2015

Show and Tell: A Neural Image Caption Generator

Dumitru Erhan

citations

Cite Score

82

AI summary

This paper introduces a neural image caption generator (NIC) that uses a CNN to encode images and an LSTM to generate descriptive sentences, achieving state-of-the-art results on Pascal, Flickr30k, SBU, and COCO datasets by maximizing the likelihood of the target description sentence given the training image.

Main Contributions

  • Introduces an end-to-end neural network system (NIC) for generating image captions.
  • Combines state-of-the-art sub-networks for vision (CNN) and language models (LSTM).
  • Achieves significantly better performance compared to state-of-the-art approaches on Pascal, Flickr30k, SBU, and COCO datasets.
  • Demonstrates the effectiveness of transfer learning and data size in image captioning.
  • Provides an analysis of the learned word embeddings and generation diversity.

Abstract

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.

Citation Graph

Loading graph...

References [33]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

S. Ioffe, Christian Szegedy - 2015

18 papers in library cite

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002

19 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

J. Donahue, Y. Jia, Oriol Vinyals, J. Hoffman, N. Zhang, E. Tzeng, Trevor Darrell - 2014

15 papers in library cite

Alex Graves - 2013

27 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Richard S. Zemel - 2014

5 papers in library cite

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

M. Hodosh, P. Young, J. Hockenmaier - 2013

4 papers in library cite

R. Kiros, Richard S. Zemel, Ruslan Salakhutdinov - 2014

3 papers in library cite

J. Mao, Weixin Xu, Yining Yang, J. Wang, A. L. Yuille - 2014

3 papers in library cite

P. Young, A. L. M. Hodosh, J. Hockenmaier - 2014

5 papers in library cite

Richard Socher, Quoc Le, C. Manning, A. Ng - 2014

5 papers in library cite

G. Kulkarni, V. Premraj, S. Dhar, Shanda Li, Yejin Choi, A. C. Berg, T. L. Berg - 2011

4 papers in library cite

P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, Yejin Choi - 2012

3 papers in library cite

Shanda Li, G. Kulkarni, T. L. Berg, A. C. Berg, Yejin Choi - 2011

3 papers in library cite

Ali Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, David Forsyth - 2010

3 papers in library cite

V. Ordonez, G. Kulkarni, T. L. Berg - 2011

3 papers in library cite

M. Mitchell, X. Han, J. Dodge, A. Mensch, A. G. A. P. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, H. D. Iii - 2012

3 papers in library cite

P. Kuznetsova, V. Ordonez, T. L. Berg, Yejin Choi - 2014

3 papers in library cite

C. Rashtchian, P. Young, M. Hodosh, J. Hockenmaier - 2010

2 papers in library cite

A. Karpathy, Armand Joulin, Li Fei Fei - 2014

2 papers in library cite

D. Elliott, F. Keller - 2013

2 papers in library cite

Y. Gong, Lisa Wang, M. Hodosh, J. Hockenmaier, Svetlana Lazebnik - 2014

2 papers in library cite

R. Vedantam, C. L. Zitnick, D. Parikh - 2015

1 paper in library cites

A. Aker, R. Gaizauskas - 2010

1 paper in library cites

B. Z. Yao, X. Yang, L. Lin, M. W. Lee, S. C. Zhu - 2010

1 paper in library cites

Cited by

11

papers in your library

Cites

16

papers in your library

Read

on October 15, 2025

Your review

Tags

Paper Aliases

No aliases