2018
Cite Score
22
AI summary
This paper introduces adaptive input embeddings, extending adaptive softmax for neural language modeling, and evaluates them on the WIKITEXT-103 and BILLION WORD benchmarks, achieving state-of-the-art perplexity scores and faster training times compared to character input CNNs.
Main Contributions
Abstract
We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and output layers, and whether to model words, characters or sub-word units. We perform a systematic comparison of popular choices for a self-attentional architecture. Our experiments show that models equipped with adaptive embeddings are more than twice as fast to train than the popular character input CNN while having a lower number of parameters. On the WIKITEXT-103 benchmark we achieve 18.7 perplexity, an improvement of 10.5 perplexity compared to the previously best published result and on the BILLION WORD benchmark, we achieve 23.02 perplexity.
Citation Graph
References [34]
K. He, X. Zhang, S. Ren, Jian Sun - 2016
20 papers in library cite
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001
62 papers in library cite
Frank Hutter - 2017
4 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013
21 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013
13 papers in library cite
Noam Shazeer, Azalia Mirhoseini, K. Maziarz, A. Davis, Quoc Le, Geoffrey Hinton, Jeffrey Dean - 2017
9 papers in library cite
S. Merity, Caiming Xiong, J. Bradbury, Richard Socher - 2017
12 papers in library cite
Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011
16 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
F. Morin, Yoshua Bengio - 2005
19 papers in library cite
C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013
13 papers in library cite
O. Press, Lior Wolf - 2017
7 papers in library cite
R. A. Rfou, D. Choe, Noah Constant, M. Guo, Llion Jones - 2018
6 papers in library cite
E. Grave, Armand Joulin, Nicolas Usunier - 2016
7 papers in library cite
E. Grave, Armand Joulin, M. Cisse, D. Grangier, Hervé Jégou - 2017
4 papers in library cite
Yann N. Dauphin, A. Fan, Michael Auli, D. Grangier - 2016
8 papers in library cite
Holger Schwenk, A. Rousseau, M. Attik - 2012
5 papers in library cite
Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016
7 papers in library cite
J. T. Goodman - 2001
7 papers in library cite
H. Inan, K. Khosravi, Richard Socher - 2017
6 papers in library cite
Ashish Vaswani, Y. Zhao, V. Fossum, D. Chiang - 2013
5 papers in library cite
M. Ott, S. Edunov, D. Grangier, Michael Auli - 2018
3 papers in library cite
Noam Shazeer, J. Pelemans, C. Chelba - 2015
3 papers in library cite
S. Merity, Nitish Shirish Keskar, Richard Socher - 2018
2 papers in library cite
E. Arisoy, T. N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran - 2012
2 papers in library cite
J. W. Rae, C. Dyer, Peter Dayan, T. P. Lillicrap - 2018
2 papers in library cite
Weizhu Chen, D. Grangier, Michael Auli - 2015
2 papers in library cite
M. Ott, Michael Auli, D. Grangier, Marc'aurelio Ranzato - 2018
1 paper in library cites
J. Buckman, Graham Neubig - 2018
1 paper in library cites
P. Baltescu, Phil Blunsom - 2015
1 paper in library cites
S. J. Mielke, J. Eisner - 2018
1 paper in library cites
Cited by
3
papers in your library
Cites
19
papers in your library
Read
on November 15, 2025
Your review
Tags
Paper Aliases
No aliases