2014

Unifying Visual-Semantic Embeddings With Multimodal Neural Language Models

Richard S. Zemel

citations

Cite Score

51

AI summary

This paper introduces a novel encoder-decoder pipeline that learns a multimodal joint embedding space with images and text and a structure-content neural language model (SC-NLM) for decoding. The results achieve state-of-the-art performance on Flickr8K and Flickr30K using LSTM to encode sentences.

Main Contributions

  • Introduces an encoder-decoder pipeline for multimodal learning.
  • Proposes a structure-content neural language model (SC-NLM) that disentangles sentence structure from content.
  • Achieves state-of-the-art performance on Flickr8K and Flickr30K datasets without object detections.
  • Demonstrates multimodal regularities using linear encoders.
  • Shows sample captions generated for 800 images.

Abstract

Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car* - "blue" + "red" is near images of red cars. Sample captions generated for 800 images are made available for comparison.

Citation Graph

Loading graph...

References [46]

Sort:
Filter:

K. Simonyan, Andrew Zisserman - 2014

20 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014

14 papers in library cite

N. Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2014

20 papers in library cite

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Alex Graves - 2013

27 papers in library cite

Tomas Mikolov, W. T. Yih, Geoffrey Zweig - 2013

8 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Alex Graves, Navdeep Jaitly, Abdel Rahman Mohamed - 2013

2 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

A. Mnih, Geoffrey Hinton - 2007

12 papers in library cite

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, John Makhoul - 2014

9 papers in library cite

Jason Weston, Samy Bengio, Nicolas Usunier - 2010

3 papers in library cite

Andrea Frome, G. S. Corrado, J. Shlens, Samy Bengio, Jeffrey Dean, Tomas Mikolov, Marc'aurelio Ranzato - 2013

4 papers in library cite

M. Hodosh, P. Young, J. Hockenmaier - 2013

4 papers in library cite

R. Kiros, Richard S. Zemel, Ruslan Salakhutdinov - 2014

3 papers in library cite

J. Mao, Weixin Xu, Yining Yang, J. Wang, A. L. Yuille - 2014

3 papers in library cite

K. M. Hermann, Phil Blunsom - 2014

3 papers in library cite

Alex Graves, M. Liwicki, Santiago Fernandez, R. Bertolami, H. Bunke, Jürgen Schmidhuber - 2009

5 papers in library cite

P. Young, A. L. M. Hodosh, J. Hockenmaier - 2014

5 papers in library cite

Richard Socher, Quoc Le, C. Manning, A. Ng - 2014

5 papers in library cite

G. Kulkarni, V. Premraj, S. Dhar, Shanda Li, Yejin Choi, A. C. Berg, T. L. Berg - 2011

4 papers in library cite

P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, Yejin Choi - 2012

3 papers in library cite

Shanda Li, G. Kulkarni, T. L. Berg, A. C. Berg, Yejin Choi - 2011

3 papers in library cite

Ali Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, David Forsyth - 2010

3 papers in library cite

Alex Krizhevsky, Geoffrey E. Hinton - 2010

3 papers in library cite

V. Ordonez, G. Kulkarni, T. L. Berg - 2011

3 papers in library cite

M. Mitchell, X. Han, J. Dodge, A. Mensch, A. G. A. P. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, H. D. Iii - 2012

3 papers in library cite

P. Kuznetsova, V. Ordonez, T. L. Berg, Yejin Choi - 2014

3 papers in library cite

Yining Yang, C. L. Teo, H. D. Iii, Y. Aloimonos - 2011

2 papers in library cite

A. Karpathy, Armand Joulin, Li Fei Fei - 2014

2 papers in library cite

Y. Gong, Lisa Wang, M. Hodosh, J. Hockenmaier, Svetlana Lazebnik - 2014

2 papers in library cite

K. M. Hermann, Phil Blunsom - 2014

2 papers in library cite

J. Ngiam, A. Khosla, M. Kim, J. Nam, Honglak Lee, A. Ng - 2011

2 papers in library cite

N. Srivastava, Ruslan Salakhutdinov - 2012

2 papers in library cite

R. Memisevic, Geoffrey Hinton - 2007

2 papers in library cite

Phil Blunsom, N. D. Freitas, Edward Grefenstette, K. M. Hermann - 2014

1 paper in library cites

R. Kiros, Richard S. Zemel, Ruslan Salakhutdinov - 2014

1 paper in library cites

Y. Jia, M. Salzmann, Trevor Darrell - 2011

1 paper in library cites

M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, B. Schiele - 2013

1 paper in library cites

Cited by

5

papers in your library

Cites

24

papers in your library

Read

on October 13, 2025

Your review

Tags

Paper Aliases

No aliases