2012

Large Scale Distributed Deep Networks

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng

citations

Cite Score

74

AI summary

This paper introduces DistBelief, a software framework for distributed training of deep networks, and presents Downpour SGD and Sandblaster L-BFGS, two algorithms that increase the scale and speed of deep network training and achieve state-of-the-art results on ImageNet.

Main Contributions

  • Introduces DistBelief, a software framework for parallel distributed training of deep networks.
  • Presents Downpour SGD, a highly asynchronous variant of SGD, that works surprisingly well for training nonconvex deep learning models.
  • Introduces Sandblaster L-BFGS, a distributed implementation of L-BFGS, that can be competitive with SGD.
  • Achieved a cross-validated classification accuracy of over 15% on the ImageNet object classification task, using a model with over 1 billion parameters.

Abstract

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

Citation Graph

Loading graph...

References [29]

Sort:
Filter:

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Alex Krizhevsky - 2009

27 papers in library cite

Jeffrey Dean, Sanjay Ghemawat - 2004

4 papers in library cite

John Duchi, Elad Hazan, Yoram Singer - 2011

19 papers in library cite

Geoffrey Hinton - 2012

21 papers in library cite

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Dan C. Ciresan, Ueli Meier, Jürgen Schmidhuber - 2012

11 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Quoc V. Le, M. A. Ranzato, R. Monga, M. Devin, K. Chen, G. S. Corrado, Jeffrey Dean, Andrew Y. Ng - 2012

10 papers in library cite

James Martens - 2010

12 papers in library cite

Vincent Vanhoucke, A. Senior, Mark Z. Mao - 2011

4 papers in library cite

Dan C. Ciresan, Ueli Meier, Luca M. Gambardella, Jürgen Schmidhuber - 2010

10 papers in library cite

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

L. Deng, D. Yu, J. Platt - 2012

2 papers in library cite

A. Coates, A. Ng, Honglak Lee - 2011

7 papers in library cite

Benjamin Recht, C. Re, S. Wright, F. Niu - 2011

6 papers in library cite

Rajat Raina, A. Madhavan, Andrew Y. Ng - 2009

4 papers in library cite

Quoc V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Andrew Y. Ng - 2011

4 papers in library cite

Leon Bottou - 1991

2 papers in library cite

Akshat Agarwal, O. Chapelle, M. Dudik, John Langford - 2011

1 paper in library cites

Akshat Agarwal, John Duchi - 2011

1 paper in library cites

Y. Low, Joseph Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. Hellerstein - 2012

1 paper in library cites

R. Mcdonald, K. Hall, G. Mann - 2010

1 paper in library cites

G. Mann, R. Mcdonald, M. Mohri, N. Silberman, D. Walker - 2009

1 paper in library cites

Q. Shi, J. Petterson, G. Dror, John Langford, A. Smola, A. Strehl, V. Vishwanathan - 2009

1 paper in library cites

M. Zinkevich, M. Weimer, A. Smola, Lei Li - 2010

1 paper in library cites

John Langford, A. Smola, M. Zinkevich - 2009

1 paper in library cites

Cited by

16

papers in your library

Cites

16

papers in your library

Read

on July 2, 2025

Your review

Tags

Paper Aliases

No aliases