Papperoni

2012

Random Search for Hyper-Parameter Optimization

James Bergstra, Yoshua Bengio

citations

Cite Score

AI summary

This paper demonstrates that random search is more computationally efficient for hyper-parameter optimization than grid search and manual search, achieving comparable or superior performance for neural networks and deep belief networks across various datasets, supported by Gaussian process analysis revealing low effective dimensionality of hyper-parameters.

Main Contributions

Empirically and theoretically showed that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid.
Demonstrated random search can find models as good or better than grid search for neural networks with a fraction of computation time.
For deep belief networks, random search achieved statistically equal or superior performance on 5 out of 7 datasets compared to thoughtful manual and grid search combinations.
Introduced Gaussian process analysis to show that only a few hyper-parameters are crucial for performance, but their importance varies across datasets.
Proposed random search as a natural baseline for evaluating adaptive hyper-parameter optimization algorithms.

Abstract

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput” methods achieve surprising success—they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

Citation Graph

Loading graph...

References [34]

Sort:

Filter:

[1]Gradient-Based Learning Applied to Document Recognition

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite