2017
Cite Score
67
AI summary
This paper introduces a Sparsely-Gated Mixture-of-Experts layer (MoE) that achieves over 1000x improvement in model capacity. It applies a MoE with up to 137 billion parameters convolutionally between stacked LSTM layers, achieving significantly better results on large language modeling and machine translation benchmarks.
Main Contributions
Abstract
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.
Citation Graph
References [44]
K. He, X. Zhang, S. Ren, Jian Sun - 2016
20 papers in library cite
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012
71 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
S. Ioffe, Christian Szegedy - 2015
18 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
John Duchi, Elad Hazan, Yoram Singer - 2011
19 papers in library cite
M. Abadi, Akshat Agarwal, P. Barham, E. Brevdo, Ziru Chen, C. Citro, G. Corrado, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, I. Goodfellow, A. Harp, Geoffrey Irving, M. Isard, Y. Jia, R. Jozefowicz, Lukasz Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, Christopher Olah, M. Schuster, J. Shlens, B. Steiner, Ilya Sutskever, K. Talwar, P. Tucker, Vincent Vanhoucke, V. Vasudevan, F. Viegas, Oriol Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2015
11 papers in library cite
T. Luong, H. Pham, Christopher D. Manning - 2015
15 papers in library cite
Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016
15 papers in library cite
Felix A. Gers, Jürgen Schmidhuber, Fred Cummins - 2000
13 papers in library cite
Robert A. Jacobs, Michael I. Jordan, S. J. Nowlan, Geoffrey E. Hinton - 1991
5 papers in library cite
M. Jordan, Rowan Jacobs - 1994
3 papers in library cite
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014
22 papers in library cite
Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng - 2012
10 papers in library cite
R. Kneser, Hermann Ney - 1995
11 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013
13 papers in library cite
M. Schuster, Kaisuke Nakajima - 2012
3 papers in library cite
T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014
14 papers in library cite
Geoffrey E. Hinton, L. Deng, D. Yu, George E. Dahl, A. Mohamed, Navdeep Jaitly, A. Senior, Vincent Vanhoucke, P. Nguyen, T. N. Sainath, Brian Kingsbury - 2012
8 papers in library cite
Dario Amodei, S. Ananthanarayanan, R. Anubhai, Jinze Bai, E. Battenberg, C. Case, J. Casper, Bryan Catanzaro, Q. Cheng, Guanduo Chen - 2016
3 papers in library cite
H. Sak, A. W. Senior, F. Beaufays - 2014
5 papers in library cite
M. J. Johnson, M. Schuster, Quoc V. Le, M. Krikun, Yonghui Wu, Ziru Chen, N. Thorat, F. B. Viegas, M. Wattenberg, G. S. Corrado, M. Hughes, Jeffrey Dean - 2017
7 papers in library cite
D. Eigen, Marc'aurelio Ranzato, Ilya Sutskever - 2013
1 paper in library cites
E. Bengio, P. L. Bacon, J. Pineau, D. Precup - 2015
1 paper in library cites
E. Garmash, C. Monz - 2016
1 paper in library cites
Kyunghyun Cho, Yoshua Bengio - 2014
1 paper in library cites
Yoshua Bengio, N. Leonard, Aaron Courville - 2013
3 papers in library cite
N. Durrani, B. Haddow, P. Koehn, K. Heafield - 2014
6 papers in library cite
Jingren Zhou, Yue Cao, Xinpeng Wang, P. L. Li, Weixin Xu - 2016
5 papers in library cite
Ronan Collobert, Samy Bengio, Yoshua Bengio - 2002
1 paper in library cites
P. Gallinari, L. Denoyer - 2014
1 paper in library cites
M. P. Deisenroth, J. W. Ng - 2015
1 paper in library cites
Amjad Almahairi, Nicolas Ballas, T. Cooijmans, Y. Zheng, Hugo Larochelle, Aaron Courville - 2015
1 paper in library cites
R. Aljundi, P. Chakravarty, T. Tuytelaars - 2016
1 paper in library cites
L. Theis, M. Bethge - 2015
1 paper in library cites
B. Yao, D. Walther, D. Beck, Li Fei Fei - 2009
1 paper in library cites
C. E. Rasmussen, Zoubin Ghahramani - 2002
1 paper in library cites
A. Davis, I. Arel - 2013
1 paper in library cites
A. Gruslys, Rémi Munos, Ivo Danihelka, M. Lanctot, Alex Graves - 2016
1 paper in library cites
V. Tresp - 2001
1 paper in library cites
B. Shahbaba, R. Neal - 2009
1 paper in library cites
Cited by
9
papers in your library
Cites
30
papers in your library
Read
on August 17, 2025
Your review
Tags
Paper Aliases
No aliases