2014
Cite Score
51
AI summary
This paper introduces a novel encoder-decoder pipeline that learns a multimodal joint embedding space with images and text and a structure-content neural language model (SC-NLM) for decoding. The results achieve state-of-the-art performance on Flickr8K and Flickr30K using LSTM to encode sentences.
Main Contributions
Abstract
Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car* - "blue" + "red" is near images of red cars. Sample captions generated for 800 images are made available for comparison.
Citation Graph
References [46]
K. Simonyan, Andrew Zisserman - 2014
20 papers in library cite
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012
71 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014
14 papers in library cite
N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014
20 papers in library cite
Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013
26 papers in library cite
Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014
18 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014
38 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001
62 papers in library cite
Alex Graves - 2013
27 papers in library cite
Tomas Mikolov, W. T. Yih, Geoffrey Zweig - 2013
8 papers in library cite
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014
22 papers in library cite
Alex Graves, Navdeep Jaitly, Abdel Rahman Mohamed - 2013
2 papers in library cite
N. Kalchbrenner, Phil Blunsom - 2013
27 papers in library cite
A. Mnih, Geoffrey Hinton - 2007
12 papers in library cite
Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, John Makhoul - 2014
9 papers in library cite
Jason Weston, Samy Bengio, Nicolas Usunier - 2010
3 papers in library cite
Andrea Frome, G. S. Corrado, J. Shlens, Samy Bengio, Jeffrey Dean, Tomas Mikolov, Marc'aurelio Ranzato - 2013
4 papers in library cite
M. Hodosh, P. Young, J. Hockenmaier - 2013
4 papers in library cite
R. Kiros, Richard S. Zemel, Ruslan Salakhutdinov - 2014
3 papers in library cite
J. Mao, Weixin Xu, Yining Yang, J. Wang, A. L. Yuille - 2014
3 papers in library cite
K. M. Hermann, Phil Blunsom - 2014
3 papers in library cite
Alex Graves, M. Liwicki, Santiago Fernandez, R. Bertolami, H. Bunke, Jürgen Schmidhuber - 2009
5 papers in library cite
P. Young, A. L. M. Hodosh, J. Hockenmaier - 2014
5 papers in library cite
Richard Socher, Quoc Le, C. Manning, A. Ng - 2014
5 papers in library cite
G. Kulkarni, V. Premraj, S. Dhar, Shanda Li, Yejin Choi, A. C. Berg, T. L. Berg - 2011
4 papers in library cite
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, Yejin Choi - 2012
3 papers in library cite
Shanda Li, G. Kulkarni, T. L. Berg, A. C. Berg, Yejin Choi - 2011
3 papers in library cite
Ali Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, David Forsyth - 2010
3 papers in library cite
Alex Krizhevsky, Geoffrey E. Hinton - 2010
3 papers in library cite
V. Ordonez, G. Kulkarni, T. L. Berg - 2011
3 papers in library cite
M. Mitchell, X. Han, J. Dodge, A. Mensch, A. G. A. P. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, H. D. Iii - 2012
3 papers in library cite
P. Kuznetsova, V. Ordonez, T. L. Berg, Yejin Choi - 2014
3 papers in library cite
Yining Yang, C. L. Teo, H. D. Iii, Y. Aloimonos - 2011
2 papers in library cite
A. Karpathy, Armand Joulin, Li Fei Fei - 2014
2 papers in library cite
Y. Gong, Lisa Wang, M. Hodosh, J. Hockenmaier, Svetlana Lazebnik - 2014
2 papers in library cite
K. M. Hermann, Phil Blunsom - 2014
2 papers in library cite
J. Ngiam, A. Khosla, M. Kim, J. Nam, Honglak Lee, A. Ng - 2011
2 papers in library cite
N. Srivastava, Ruslan Salakhutdinov - 2012
2 papers in library cite
R. Memisevic, Geoffrey Hinton - 2007
2 papers in library cite
Phil Blunsom, N. D. Freitas, Edward Grefenstette, K. M. Hermann - 2014
1 paper in library cites
R. Kiros, Richard S. Zemel, Ruslan Salakhutdinov - 2014
1 paper in library cites
Y. Jia, M. Salzmann, Trevor Darrell - 2011
1 paper in library cites
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, B. Schiele - 2013
1 paper in library cites
Cited by
5
papers in your library
Cites
24
papers in your library
Read
on October 13, 2025
Your review
Tags
Paper Aliases
No aliases