2018

Adaptive Input Representations for Neural Language Modeling

A. Baevski, Michael Auli

citations

Cite Score

22

AI summary

This paper introduces adaptive input embeddings, extending adaptive softmax for neural language modeling, and evaluates them on the WIKITEXT-103 and BILLION WORD benchmarks, achieving state-of-the-art perplexity scores and faster training times compared to character input CNNs.

Main Contributions

  • Introduces adaptive input embeddings that extend adaptive softmax to input word representations.
  • Demonstrates that adaptive input embeddings reduce overfitting to rare words by assigning more capacity to frequent words and less to infrequent ones.
  • Shows that models with adaptive word representations outperform strong character-based models while training more than twice as fast.
  • Achieves a perplexity of 18.7 on the WIKITEXT-103 benchmark.
  • Achieves a perplexity of 23.02 on the BILLION WORD benchmark.

Abstract

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and output layers, and whether to model words, characters or sub-word units. We perform a systematic comparison of popular choices for a self-attentional architecture. Our experiments show that models equipped with adaptive embeddings are more than twice as fast to train than the popular character input CNN while having a lower number of parameters. On the WIKITEXT-103 benchmark we achieve 18.7 perplexity, an improvement of 10.5 perplexity compared to the previously best published result and on the BILLION WORD benchmark, we achieve 23.02 perplexity.

Citation Graph

Loading graph...

References [34]

Sort:
Filter:

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Frank Hutter - 2017

4 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013

13 papers in library cite

Noam Shazeer, Azalia Mirhoseini, K. Maziarz, A. Davis, Quoc Le, Geoffrey Hinton, Jeffrey Dean - 2017

9 papers in library cite

S. Merity, Caiming Xiong, J. Bradbury, Richard Socher - 2017

12 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016

20 papers in library cite

F. Morin, Yoshua Bengio - 2005

19 papers in library cite

C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013

13 papers in library cite

O. Press, Lior Wolf - 2017

7 papers in library cite

R. A. Rfou, D. Choe, Noah Constant, M. Guo, Llion Jones - 2018

6 papers in library cite

E. Grave, Armand Joulin, Nicolas Usunier - 2016

7 papers in library cite

E. Grave, Armand Joulin, M. Cisse, D. Grangier, Hervé Jégou - 2017

4 papers in library cite

Yann N. Dauphin, A. Fan, Michael Auli, D. Grangier - 2016

8 papers in library cite

Holger Schwenk, A. Rousseau, M. Attik - 2012

5 papers in library cite

Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016

7 papers in library cite

J. T. Goodman - 2001

7 papers in library cite

H. Inan, K. Khosravi, Richard Socher - 2017

6 papers in library cite

Ashish Vaswani, Y. Zhao, V. Fossum, D. Chiang - 2013

5 papers in library cite

M. Ott, S. Edunov, D. Grangier, Michael Auli - 2018

3 papers in library cite

Noam Shazeer, J. Pelemans, C. Chelba - 2015

3 papers in library cite

S. Merity, Nitish Shirish Keskar, Richard Socher - 2018

2 papers in library cite

E. Arisoy, T. N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran - 2012

2 papers in library cite

J. W. Rae, C. Dyer, Peter Dayan, T. P. Lillicrap - 2018

2 papers in library cite

Weizhu Chen, D. Grangier, Michael Auli - 2015

2 papers in library cite

M. Ott, Michael Auli, D. Grangier, Marc'aurelio Ranzato - 2018

1 paper in library cites

J. Buckman, Graham Neubig - 2018

1 paper in library cites

P. Baltescu, Phil Blunsom - 2015

1 paper in library cites

S. J. Mielke, J. Eisner - 2018

1 paper in library cites

Cited by

3

papers in your library

Cites

19

papers in your library

Read

on November 15, 2025

Your review

Tags

Paper Aliases

No aliases