2013

Linguistic Regularities in Continuous Space Word Representations

Tomas Mikolov, W. T. Yih, Geoffrey Zweig

citations

Cite Score

75

AI summary

This paper examines vector-space word representations and finds that they capture syntactic and semantic regularities in language, characterized by relation-specific vector offsets. It introduces a new syntactic test set and shows that the method outperforms previous systems on the SemEval-2012 Task 2.

Main Contributions

  • Demonstrates that word representations capture syntactic and semantic regularities in language.
  • Introduces a vector offset method for identifying linguistic regularities in continuous space word representations.
  • Presents a new dataset for measuring syntactic performance.
  • Achieves almost 40% accuracy on the new syntactic test set.
  • Outperforms the previous state-of-the-art on the SemEval 2012 task.

Abstract

Continuous space language models have recently demonstrated outstanding results across a variety of tasks. In this paper, we examine the vector-space word representations that are implicitly learned by the input-layer weights. We find that these representations are surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. This allows vector-oriented reasoning based on the offsets between words. For example, the male/female relationship is automatically learned, and with the induced vector representations, “King - Man + Woman" results in a vector very close to "Queen." We demonstrate that the word vectors capture syntactic regularities by means of syntactic analogy questions (provided with this paper), and are able to correctly answer almost 40% of the questions. We demonstrate that the word vectors capture semantic regularities by using the vector offset method to answer SemEval-2012 Task 2 questions. Remarkably, this method outperforms the best previous systems.

Citation Graph

Loading graph...

References [23]

Sort:
Filter:

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

J. Turian, L. Ratinov, Yoshua Bengio - 2010

17 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

Geoffrey E. Hinton - 1986

13 papers in library cite

F. Morin, Yoshua Bengio - 2005

19 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

Tomas Mikolov, A. Deoras, D. Povey, Lukas Burget, Jan Cernocky - 2011

9 papers in library cite

Holger Schwenk - 2007

12 papers in library cite

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman - 1990

12 papers in library cite

J. B. Pollack - 1990

7 papers in library cite

Antoine Bordes, Xavier Glorot, Jason Weston, Yoshua Bengio - 2012

2 papers in library cite

H. S. Le, I. Oparin, A. Allauzen, Jean Luc Gauvain, F. Yvon - 2011

7 papers in library cite

Jeffrey L. Elman - 1991

5 papers in library cite

Yoshua Bengio, Holger Schwenk, Jean Sebastien Senecal, F. Morin, Jean Luc Gauvain - 2006

3 papers in library cite

P. D. Turney - 2012

2 papers in library cite

David Jurgens, Saif M. Mohammad, P. Turney, K. Holyoak - 2012

2 papers in library cite

Geoffrey Hinton, Ruslan Salakhutdinov - 2010

1 paper in library cites

Tomas Mikolov - 2012

1 paper in library cites

B. Rink, S. Harabagiu - 2012

1 paper in library cites

Cited by

8

papers in your library

Cites

15

papers in your library

Read

on March 20, 2025

Your review

Tags

Paper Aliases

No aliases