Papperoni

2009

Curriculum Learning

Yoshua Bengio, J. Louradour, Ronan Collobert, Jason Weston

Open PDF Google Scholar

citations

Cite Score

79

AI summary

This paper introduces curriculum learning, a training strategy for machine learning where examples are presented in a meaningful order, illustrating gradually more concepts and increasing complexity, resulting in improved generalization, faster convergence, and better local minima, with experiments on vision and language tasks, deep neural networks, and shape recognition.

Main Contributions

Formalizes curriculum learning as a training strategy in machine learning.
Demonstrates improved generalization and faster convergence through curriculum learning.
Hypothesizes that curriculum learning helps find better local minima in non-convex optimization.
Introduces simple multi-stage curriculum strategies for vision and language tasks.
Shows how curriculum learning can act as a regularizer.

Abstract

Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". In the context of recent research studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neural networks), we explore curriculum learning in various set-ups. The experiments show that significant improvements in generalization can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).

Citation Graph

Loading graph...

References [30]

Sort:

Filter:

[1]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[2]A Fast Learning Algorithm for Deep Belief Nets

Geoffrey E. Hinton, S. Osindero, Y. Teh - 2006

43 papers in library cite

The paper does not explain anything. It just throws the idea and a bunch of math, but doesn't really care to explain the concepts.

[3]Learning Deep Architectures for AI

Yoshua Bengio - 2009

25 papers in library cite

It's a nice overview. Some sections get very theoretical, but the first half is very good and I feel that it does a waaaay better job of explaining RBMs and DBNs than other papers. This feels like Bengio is taking your hand and saying "if you don't know what's going on, here you go, everything you need to know to jump into the deep nets train"

[4]A Neural Probabilistic Language Model

Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001

62 papers in library cite

What started it all. Very simple and elegant.

[5]Extracting and Composing Robust Features With Denoising Autoencoders

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre Antoine Manzagol - 2008

25 papers in library cite

I am *so* glad we found an alternative to DBNs. Also, introduced the idea of denoising which is nice.

[6]A Unified Architecture for Natural Language Processing: Deep Neural Networks With Multitask Learning

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

Really did not add much to the game. I think this was more of a small perf. improvement over other existing things and set a few methodological standards. Maybe main contribution is Multitask Learning + Deep learning

[7]Greedy Layer-Wise Training of Deep Networks

Yoshua Bengio, P. Lamblin, D. Popovici, Hugo Larochelle - 2006

33 papers in library cite

Bengio is perfect. This is everything that Hinton's paper hoped to be. Very well explained, and also tying back to real use cases (not just "hey, the math works and it reduced the score")

[8]Learning and Development in Neural Networks: The Importance of Starting Small

Jeffrey L. Elman - 1993

5 papers in library cite

This is such a nice paper! Maybe because it's written for a specific public, but it's such an easy read and ties back a lot with neural/biological concepts!

[9]Efficient Learning of Sparse Representations With an Energy-Based Model

Marc'aurelio Ranzato, C. Poultney, S. Chopra, Yann Lecun - 2006

20 papers in library cite

It's ok. Not really good, but alright.

[10]An Empirical Evaluation of Deep Architectures on Problems With Many Factors of Variation

Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, Yoshua Bengio - 2007

13 papers in library cite

Good paper showing promising results for Deep Learning. Nothing amazing but good nonetheless

[11]Deep Learning via Semi-Supervised Embedding

Jason Weston, F. Ratle, Ronan Collobert - 2008

10 papers in library cite

It's a good paper and nice idea, but seems overly complicated and I don't think it's very used... (PS: this was republished in 2012)

[12]The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training

Pascal Vincent - 2009

5 papers in library cite

Very nice analysis of why supervised pretraining works!

[13]Flexible Shaping: How Learning in Small Steps Helps

K. A. Krueger, Peter Dayan - 2009

1 paper in library cites

Good paper overall, fun to read, but it's just a toy example so results are very meh.

[14]Connectionist Language Modeling for Large Vocabulary Continuous Speech Recognition

Holger Schwenk, Jean Luc Gauvain - 2002

14 papers in library cite

Only real relevance is being early. Otherwise not much to see.

[15]Sparse Feature Learning for Deep Belief Networks

Marc'aurelio Ranzato, Y. Boureau, Yann Lecun - 2008

12 papers in library cite

[16]Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks

Y. Freund, D. Haussler - 1992

8 papers in library cite

[17]On the Power of Small-Depth Threshold Circuits

J. Hastad, M. Goldmann - 1991

7 papers in library cite

[18]Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure

Ruslan Salakhutdinov, Geoffrey Hinton - 2007

5 papers in library cite

[19]Restricted Boltzmann Machines for Collaborative Filtering

Ruslan Salakhutdinov, A. Mnih, Geoffrey E. Hinton - 2007

5 papers in library cite

[20]Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Ruslan Salakhutdinov, Geoffrey E. Hinton - 2008

4 papers in library cite

[21]A Day of Great Illumination: B. F. Skinner's Discovery of Shaping

G. B. Peterson - 2004

3 papers in library cite

[22]Active Learning With Statistical Models

D. Cohn, Zoubin Ghahramani, M. Jordan - 1995

2 papers in library cite

[23]Explanation-Based Neural Network Learning: A Lifelong Learning Approach

Sebastian Thrun - 1996

2 papers in library cite

[24]Global Continuation for Distance Geometry Problems

Ziyi Wu - 1997

2 papers in library cite

[25]Language Acquisition in the Absence of Explicit Negative Evidence: How Important Is Starting Small?

D. Rohde, D. Plaut - 1999

2 papers in library cite

[26]Numerical Continuation Methods. An Introduction

E. L. Allgower, K. Georg - 1980

2 papers in library cite

[27]Parallel Continuation-Based Global Optimization for Molecular Conformation and Protein Folding

T. Coleman, Ziyi Wu - 1994

2 papers in library cite

[28]Reinforcement Today

B. F. Skinner - 1958

2 papers in library cite

[29]Generalization in the Programed Teaching of a Perceptron

I. Derenyi, T. Geszti, G. Gyorgyi - 1994

1 paper in library cites

[30]Neural Network Learning Control of Robot Manipulators Using Gradually Increasing Task Difficulty

T. D. Sanger - 1994

1 paper in library cites

Cited by

6

papers in your library

Cites

15

papers in your library

Read

on March 26, 2025

Very nice paper that introduces curriculum learning. Possibly not too relevant, but good nonetheless.

Tags

Paper Aliases

No aliases