2017
Cite Score
16
AI summary
This paper introduces an adaptive softmax approach for training language models on GPUs, which forms clusters to minimize computation time and exploits matrix-matrix vector operations, achieving significant efficiency gains and comparable accuracy on EuroParl and One Billion Word datasets.
Main Contributions
Abstract
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.
Citation Graph
References [48]
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013
26 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014
11 papers in library cite
Jeffrey L. Elman - 1990
23 papers in library cite
John Duchi, Elad Hazan, Yoram Singer - 2011
19 papers in library cite
Geoffrey Hinton - 2013
13 papers in library cite
Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001
62 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
P. Werbos - 1990
9 papers in library cite
R. Kneser, Hermann Ney - 1995
11 papers in library cite
Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011
16 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
F. Morin, Yoshua Bengio - 2005
19 papers in library cite
C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013
13 papers in library cite
A. Mnih, Geoffrey E. Hinton - 2009
16 papers in library cite
Yoshua Bengio - 2014
12 papers in library cite
A. Mnih, Yee Whye Teh - 2012
5 papers in library cite
Tomas Mikolov, Geoffrey Zweig - 2012
12 papers in library cite
Tomas Mikolov, A. Deoras, D. Povey, Lukas Burget, Jan Cernocky - 2011
9 papers in library cite
Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, John Makhoul - 2014
9 papers in library cite
Holger Schwenk - 2007
12 papers in library cite
Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011
13 papers in library cite
Yoshua Bengio, Jean Sebastien Senecal - 2008
6 papers in library cite
Yoshua Bengio, Jean Sebastien Senecal - 2003
11 papers in library cite
Geoffrey E. Hinton, L. Deng, D. Yu, George E. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, Vincent Vanhoucke, P. Nguyen, T. N. Sainath, Brian Kingsbury - 2012
8 papers in library cite
Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton - 2015
2 papers in library cite
Tomas Mikolov, Armand Joulin, S. Chopra, M. Mathieu, Marc'aurelio Ranzato - 2015
8 papers in library cite
H. S. Le, I. Oparin, A. Allauzen, Jean Luc Gauvain, F. Yvon - 2011
7 papers in library cite
Holger Schwenk, A. Rousseau, M. Attik - 2012
5 papers in library cite
J. Goodman - 2001
15 papers in library cite
P. F. Brown, P. V. Desouza, R. L. Mercer, Vincent J. Della Pietra, J. C. Lai - 1992
12 papers in library cite
S. Katz - 1987
11 papers in library cite
J. T. Goodman - 2001
7 papers in library cite
M. Gutmann, A. Hyvarinen - 2010
7 papers in library cite
R. Kuhn, R. D. Mori - 1990
6 papers in library cite
Ronald J. Williams, J. Peng - 1990
5 papers in library cite
Ashish Vaswani, Y. Zhao, V. Fossum, D. Chiang - 2013
5 papers in library cite
L. R. Bahl, Frederick Jelinek, R. L. Mercer - 1983
4 papers in library cite
Noam Shazeer, J. Pelemans, C. Chelba - 2015
3 papers in library cite
S. Ji, S. V. N. Vishwanathan, S. Nadathur, M. J. Anderson, P. Dubey - 2015
2 papers in library cite
Pascal Vincent, A. D. Brebisson, X. Bouthillier - 2015
2 papers in library cite
P. Koehn - 2005
2 papers in library cite
G. K. Zipf - 1949
2 papers in library cite
Geoffrey Zweig, K. Makarychev - 2013
2 papers in library cite
Weizhu Chen, D. Grangier, Michael Auli - 2015
2 papers in library cite
Armand Joulin, Laurens Van Der Maaten, A. Jabri, N. Vasilache - 2015
1 paper in library cites
Jacob Andreas, Dan Klein - 2014
1 paper in library cites
Cited by
4
papers in your library
Cites
28
papers in your library
Read
on November 15, 2025
Your review
Tags
Paper Aliases
No aliases