2012

Random Search for Hyper-Parameter Optimization

James Bergstra, Yoshua Bengio

citations

Cite Score

89

AI summary

This paper demonstrates that random search is more computationally efficient for hyper-parameter optimization than grid search and manual search, achieving comparable or superior performance for neural networks and deep belief networks across various datasets, supported by Gaussian process analysis revealing low effective dimensionality of hyper-parameters.

Main Contributions

  • Empirically and theoretically showed that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid.
  • Demonstrated random search can find models as good or better than grid search for neural networks with a fraction of computation time.
  • For deep belief networks, random search achieved statistically equal or superior performance on 5 out of 7 datasets compared to thoughtful manual and grid search combinations.
  • Introduced Gaussian process analysis to show that only a few hyper-parameters are crucial for performance, but their importance varies across datasets.
  • Proposed random search as a natural baseline for evaluating adaptive hyper-parameter optimization algorithms.

Abstract

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput” methods achieve surprising success—they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

Citation Graph

Loading graph...

References [34]

Sort:
Filter:

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Yoshua Bengio - 2010

20 papers in library cite

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

Yoshua Bengio - 2009

25 papers in library cite

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

Yann Lecun, Leon Bottou, G. B. Orr, Klaus Robert Muller - 1998

20 papers in library cite

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre Antoine Manzagol, Pascal Vincent, Samy Bengio - 2010

12 papers in library cite

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, Yoshua Bengio - 2007

13 papers in library cite

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

C. M. Bishop - 1995

12 papers in library cite

Geoffrey E. Hinton - 2010

4 papers in library cite

S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi - 1983

6 papers in library cite

C. C. Chang, C. J. Lin - 2001

4 papers in library cite

Frank Hutter, H. H. Hoos, K. L. Brown - 2011

3 papers in library cite

M. D. Mckay, R. J. Beckman, W. J. Conover - 1979

1 paper in library cites

J. A. Nelder, R. Mead - 1965

1 paper in library cites

R. Bellman - 1961

1 paper in library cites

I. A. Antonov, V. M. Saleev - 1979

1 paper in library cites

Radford M. Neal - 1998

1 paper in library cites

Frank Hutter - 2009

1 paper in library cites

A. Nareyek - 2003

1 paper in library cites

C. E. Rasmussen, Christopher K. I. Williams - 2006

1 paper in library cites

T. Weise - 2009

1 paper in library cites

M. Galassi - 2009

1 paper in library cites

P. Bratley, B. L. Fox, H. Niederreiter - 1992

1 paper in library cites

A. Srinivasan, G. Ramakrishnan - 2011

1 paper in library cites

S. S. Drew, T. H. D. Mello - 2006

1 paper in library cites

N. Hansen, S. D. Muller, P. Koumoutsakos - 2003

1 paper in library cites

I. Czogiel, K. Luebke, C. Weihs - 2005

1 paper in library cites

N. L. Kleinman, J. C. Spall, D. Q. Naiman - 1999

1 paper in library cites

R. E. Caflisch, W. Morokoff, A. Owen - 1997

1 paper in library cites

Cited by

7

papers in your library

Cites

11

papers in your library

Read

on February 18, 2026

Your review

Tags

Paper Aliases

No aliases