2014

Distributed Representations of Sentences and Documents

Quoc Le, Tomas Mikolov

citations

Cite Score

88

AI summary

This paper introduces Paragraph Vector, an unsupervised algorithm using distributed memory and distributed bag of words to learn fixed-length feature representations from variable-length texts. The method achieves state-of-the-art results on several text classification and sentiment analysis tasks.

Main Contributions

  • Introduces Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length texts.
  • Proposes Distributed Memory Model of Paragraph Vectors (PV-DM) and Distributed Bag of Words version of Paragraph Vector (PV-DBOW).
  • Achieves state-of-the-art results on sentiment analysis tasks, outperforming bag-of-words models and recursive neural networks on the Stanford Sentiment Treebank dataset.
  • Demonstrates strong performance on the IMDB dataset for sentiment analysis, surpassing previous methods.
  • Shows effectiveness on an information retrieval task, achieving a 32% relative improvement in error rate compared to bag-of-words and bigrams.

Abstract

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, “powerful,” “strong” and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

Citation Graph

Loading graph...

References [38]

Sort:
Filter:

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Jeffrey L. Elman - 1990

23 papers in library cite

Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013

24 papers in library cite

Ronan Collobert, Jason Weston, Leon Bottou, M. Karlen, Koray Kavukcuoglu, P. P. Kuksa - 2011

23 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

A. L. Maas, R. E. Daly, P. T. Pham, Dong Huang, Andrew Y. Ng, Christopher Potts - 2011

12 papers in library cite

Tomas Mikolov, W. T. Yih, Geoffrey Zweig - 2013

8 papers in library cite

J. Turian, L. Ratinov, Yoshua Bengio - 2010

17 papers in library cite

Tomas Mikolov, Quoc V. Le, Ilya Sutskever - 2013

6 papers in library cite

Richard Socher, C. C. Lin, C. Manning, Andrew Y. Ng - 2011

10 papers in library cite

Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, Christopher D. Manning - 2011

10 papers in library cite

Eric H. Huang, Richard Socher, C. Manning, Andrew Y. Ng - 2012

7 papers in library cite

F. Morin, Yoshua Bengio - 2005

19 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

Richard Socher, Eric H. Huang, J. Pennin, C. Manning, A. Ng - 2011

10 papers in library cite

Andrea Frome, G. S. Corrado, J. Shlens, Samy Bengio, Jeffrey Dean, Tomas Mikolov, Marc'aurelio Ranzato - 2013

4 papers in library cite

W. Zou, Richard Socher, D. Cer, C. Manning - 2013

4 papers in library cite

A. Zhila, W. T. Yih, C. Meek, Geoffrey Zweig, Tomas Mikolov - 2013

2 papers in library cite

Tomas Mikolov - 2012

17 papers in library cite

Dan Klein, Christopher D. Manning - 2003

7 papers in library cite

Shijie Wang, Manning, C. Christopher - 2012

7 papers in library cite

P. D. Turney, P. Pantel - 2010

6 papers in library cite

J. Mitchell, Mirella Lapata - 2010

5 papers in library cite

Z. Harris - 1954

3 papers in library cite

F. Zanzotto, I. Korkontzelos, F. Fallucchi, S. Manandhar - 2010

3 papers in library cite

T. Jaakkola, D. Haussler - 1999

3 papers in library cite

F. Perronnin, C. Dance - 2007

3 papers in library cite

Yoshua Bengio, Holger Schwenk, Jean Sebastien Senecal, F. Morin, Jean Luc Gauvain - 2006

3 papers in library cite

A. Yessenalina, C. Cardie - 2011

2 papers in library cite

Edward Grefenstette, G. Dinu, Y. Z. Zhang, M. Sadrzadeh, M. Baroni - 2013

2 papers in library cite

Richard Socher, Deli Chen, Christopher D. Manning, A. Ng - 2013

2 papers in library cite

Hugo Larochelle, S. Lauly - 2012

1 paper in library cites

F. Perronnin, Yibo Liu, J. Sanchez, H. Poirier - 2010

1 paper in library cites

N. Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton - 2013

1 paper in library cites

George E. Dahl, R. P. Adams, Hugo Larochelle - 2012

1 paper in library cites

Cited by

13

papers in your library

Cites

20

papers in your library

Read

on April 21, 2025

Your review

Tags

Paper Aliases

No aliases