2017

Efficient Softmax Approximation for GPUs

E. Grave, Armand Joulin, M. Cisse, D. Grangier, Hervé Jégou

citations

Cite Score

16

AI summary

This paper introduces an adaptive softmax approach for training language models on GPUs, which forms clusters to minimize computation time and exploits matrix-matrix vector operations, achieving significant efficiency gains and comparable accuracy on EuroParl and One Billion Word datasets.

Main Contributions

  • Introduces a strategy to produce an approximate hierarchical model that considers the computation time of matrix-matrix multiplications.
  • Provides an empirical analysis of the model on recent GPUs, leading to a realistic computation time model.
  • Achieves a significant acceleration factor compared to the regular softmax, with 2x to 10x speed-ups.
  • Improves accuracy under computational constraints on large corpora.

Abstract

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.

Citation Graph

Loading graph...

References [48]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2014

11 papers in library cite

Jeffrey L. Elman - 1990

23 papers in library cite

John Duchi, Elad Hazan, Yoram Singer - 2011

19 papers in library cite

Geoffrey Hinton - 2013

13 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

P. Werbos - 1990

9 papers in library cite

R. Kneser, Hermann Ney - 1995

11 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016

20 papers in library cite

F. Morin, Yoshua Bengio - 2005

19 papers in library cite

C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013

13 papers in library cite

A. Mnih, Geoffrey E. Hinton - 2009

16 papers in library cite

Yoshua Bengio - 2014

12 papers in library cite

A. Mnih, Yee Whye Teh - 2012

5 papers in library cite

Tomas Mikolov, Geoffrey Zweig - 2012

12 papers in library cite

Tomas Mikolov, A. Deoras, D. Povey, Lukas Burget, Jan Cernocky - 2011

9 papers in library cite

Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, John Makhoul - 2014

9 papers in library cite

Holger Schwenk - 2007

12 papers in library cite

Tomas Mikolov, A. Deoras, S. Kombrink, Lukas Burget, Jan Cernocky - 2011

13 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2008

6 papers in library cite

Yoshua Bengio, Jean Sebastien Senecal - 2003

11 papers in library cite

Geoffrey E. Hinton, L. Deng, D. Yu, George E. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, Vincent Vanhoucke, P. Nguyen, T. N. Sainath, Brian Kingsbury - 2012

8 papers in library cite

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton - 2015

2 papers in library cite

Tomas Mikolov, Armand Joulin, S. Chopra, M. Mathieu, Marc'aurelio Ranzato - 2015

8 papers in library cite

H. S. Le, I. Oparin, A. Allauzen, Jean Luc Gauvain, F. Yvon - 2011

7 papers in library cite

Holger Schwenk, A. Rousseau, M. Attik - 2012

5 papers in library cite

J. Goodman - 2001

15 papers in library cite

P. F. Brown, P. V. Desouza, R. L. Mercer, Vincent J. Della Pietra, J. C. Lai - 1992

12 papers in library cite

J. T. Goodman - 2001

7 papers in library cite

M. Gutmann, A. Hyvarinen - 2010

7 papers in library cite

R. Kuhn, R. D. Mori - 1990

6 papers in library cite

Ronald J. Williams, J. Peng - 1990

5 papers in library cite

Ashish Vaswani, Y. Zhao, V. Fossum, D. Chiang - 2013

5 papers in library cite

L. R. Bahl, Frederick Jelinek, R. L. Mercer - 1983

4 papers in library cite

Noam Shazeer, J. Pelemans, C. Chelba - 2015

3 papers in library cite

S. Ji, S. V. N. Vishwanathan, S. Nadathur, M. J. Anderson, P. Dubey - 2015

2 papers in library cite

Pascal Vincent, A. D. Brebisson, X. Bouthillier - 2015

2 papers in library cite

P. Koehn - 2005

2 papers in library cite

G. K. Zipf - 1949

2 papers in library cite

Geoffrey Zweig, K. Makarychev - 2013

2 papers in library cite

Weizhu Chen, D. Grangier, Michael Auli - 2015

2 papers in library cite

Armand Joulin, Laurens Van Der Maaten, A. Jabri, N. Vasilache - 2015

1 paper in library cites

Jacob Andreas, Dan Klein - 2014

1 paper in library cites

Cited by

4

papers in your library

Cites

28

papers in your library

Read

on November 15, 2025

Your review

Tags

Paper Aliases

No aliases