2017

Using the Output Embedding to Improve Language Models

O. Press, Lior Wolf

citations

Cite Score

32

AI summary

This paper analyzes the output embedding matrix of neural network language models, revealing it as a valid word embedding. It introduces tying input and output embeddings and a regularization method, achieving reduced perplexity and smaller translation model sizes. It uses LSTM and other neural language models.

Main Contributions

  • It is shown that the output embedding matrix of neural network language models is a valid word embedding.
  • Tying the input embedding and the output embedding is recommended when training language models.
  • A new method of regularizing the output embedding is offered.
  • Weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
  • The approach leads to a significant reduction in perplexity on a variety of neural network language models.

Abstract

We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

Citation Graph

Loading graph...

References [41]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Jeffrey Dean - 2015

6 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Matthew D. Zeiler - 2012

13 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

A. L. Maas, R. E. Daly, P. T. Pham, Dong Huang, Andrew Y. Ng, Christopher Potts - 2011

12 papers in library cite

Alex Graves - 2013

27 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

M. Sundermeyer, R. Schluter, Hermann Ney - 2010

7 papers in library cite

Yarin Gal - 2015

9 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Razvan Pascanu, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2013

7 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

A. Mnih, Yee Whye Teh - 2012

5 papers in library cite

J. G. Zilly, R. K. Srivastava, J. Koutnik, Jürgen Schmidhuber - 2016

6 papers in library cite

Tomas Mikolov, Armand Joulin, S. Chopra, M. Mathieu, Marc'aurelio Ranzato - 2015

8 papers in library cite

N. Srivastava - 2013

6 papers in library cite

H. Inan, K. Khosravi, Richard Socher - 2017

6 papers in library cite

A. Axelrod, X. Fe, Jianfeng Gao - 2011

5 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

5 papers in library cite

A. Mnih, Koray Kavukcuoglu - 2013

4 papers in library cite

P. Gage - 1994

3 papers in library cite

Yarin Gal, Zoubin Ghahramani - 2015

3 papers in library cite

F. Hill, R. Reichart, Anna Korhonen - 2015

3 papers in library cite

E. Bruni, N. K. Tran, M. Baroni - 2014

2 papers in library cite

B. Mitra, E. Nalisnick, N. Craswell, Rich Caruana - 2016

1 paper in library cites

S. Zhang, H. Jiang, Mimee Xu, J. Hou, L. R. Dai - 2015

1 paper in library cites

S. Baker, R. Reichart, Anna Korhonen - 2014

1 paper in library cites

Y. Miyamoto, Kyunghyun Cho - 2016

1 paper in library cites

G. Halawi, G. Dror, E. Gabrilovich, Y. Koren - 2012

1 paper in library cites

C. A. G. Ehre, M. Moczulski, M. Denil, Yoshua Bengio - 2016

1 paper in library cites

D. Greene, P. Cunningham - 2006

1 paper in library cites

Reference title contains 'et al'

Y. Goldberg, Omer Levy - 2014

1 paper in library cites

Cited by

7

papers in your library

Cites

23

papers in your library

Read

on August 10, 2025

Your review

Tags

Paper Aliases

No aliases