2011

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John Duchi, Elad Hazan, Yoram Singer

citations

Cite Score

89

AI summary

This paper introduces Adaptive Subgradient Methods (ADAGRAD) for online learning and stochastic optimization, dynamically adjusting learning rates based on data geometry, achieving improved regret guarantees and outperforming non-adaptive methods on various datasets like ImageNet and RCV1.

Main Contributions

  • Introduces ADAGRAD, a new family of subgradient methods for online learning and stochastic optimization.
  • The methods dynamically incorporate knowledge of data geometry to perform more informative gradient-based learning.
  • Proposes an apparatus for adaptively modifying the proximal function, simplifying learning rate setting and offering improved regret guarantees.
  • Provides efficient algorithms for empirical risk minimization with common regularization functions and domain constraints.
  • Demonstrates that ADAGRAD methods outperform state-of-the-art non-adaptive subgradient algorithms in experimental studies.

Abstract

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

Citation Graph

Loading graph...

References [44]

Sort:
Filter:

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

G. Salton, C. Buckley - 1988

2 papers in library cite

D. Grangier, Samy Bengio - 2008

2 papers in library cite

G. Lan - 2010

2 papers in library cite

M. Zinkevich - 2003

2 papers in library cite

K. Crammer, O. Dekel, J. Keshet, S. S. Shwartz, Yoram Singer - 2006

2 papers in library cite

D. D. Lewis, Yining Yang, T. G. Rose, F. Li - 2004

2 papers in library cite

Antoine Bordes, Leon Bottou, P. Gallinari - 2009

2 papers in library cite

K. Bache, M. Lichman - 2013

2 papers in library cite

R. Fletcher - 1970

1 paper in library cites

N. C. Bianchi, A. Conconi, C. Gentile - 2005

1 paper in library cites

P. Auer, C. Gentile - 2000

1 paper in library cites

H. B. Mcmahan, M. Streeter - 2010

1 paper in library cites

P. L. Bartlett, Elad Hazan, A. Rakhlin - 2007

1 paper in library cites

K. Crammer, Mark Dredze, A. Kulesza - 2009

1 paper in library cites

P. M. Pardalos, J. B. Rosen - 1990

1 paper in library cites

P. Brucker - 1984

1 paper in library cites

J. V. Bondar - 1994

1 paper in library cites

John Duchi, S. S. Shwartz, Yoram Singer, A. Tewari - 2010

1 paper in library cites

J. B. H. Urruty, C. Lemarechal - 1996

1 paper in library cites

S. Boyd, L. Vandenberghe - 2004

1 paper in library cites

A. Kalai, S. Vempala - 2003

1 paper in library cites

John Duchi, Yoram Singer - 2009

1 paper in library cites

K. Crammer, Mark Dredze, Fernando Pereira - 2008

1 paper in library cites

Elad Hazan, S. Kale - 2008

1 paper in library cites

N. C. Bianchi, Y. Mansour, G. Stoltz - 2007

1 paper in library cites

G. Obozinski, B. Taskar, M. Jordan - 2007

1 paper in library cites

A. Rakhlin - 2009

1 paper in library cites

Reference title contains 'et al'

Elad Hazan, A. Kalai, S. Kale, Akshat Agarwal - 2006

1 paper in library cites

R. A. Horn, C. R. Johnson - 1985

1 paper in library cites

A. Beck, M. Teboulle - 2003

1 paper in library cites

C. Davis - 1963

1 paper in library cites

P. Tseng - 2008

1 paper in library cites

N. C. Bianchi, A. Conconi, C. Gentile - 2004

1 paper in library cites

J. Abernethy, P. L. Bartlett, A. Rakhlin, A. Tewari - 2008

1 paper in library cites

Y. Nesterov - 2009

1 paper in library cites

A. S. Nemirovski, D. B. Yudin - 1983

1 paper in library cites

A. Nemirovski, A. Juditsky, G. Lan, A. Shapiro - 2009

1 paper in library cites

Y. Nesterov - 2005

1 paper in library cites

A. Juditsky, A. Nemirovski, C. Tauvel - 2008

1 paper in library cites

A. Nedic - 2002

1 paper in library cites

Cited by

19

papers in your library

Cites

2

papers in your library

Read

on February 15, 2026

Your review

Tags

Paper Aliases

No aliases