Papperoni

2007

Self-Taught Learning: Transfer Learning From Unlabeled Data

Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, A. Ng

Open PDF Google Scholar

citations

Cite Score

57

AI summary

This paper introduces "self-taught learning," a new machine learning framework that leverages sparse coding to extract higher-level features from unlabeled data, significantly improving classification performance on tasks such as image, audio, and text classification, even when the unlabeled data does not share the same class labels or generative distribution as the labeled data.

Main Contributions

Introduces "self-taught learning," a novel machine learning framework for using unlabeled data in supervised classification tasks without assuming shared class labels or generative distributions between labeled and unlabeled data.
Proposes an approach to self-taught learning using sparse coding to construct higher-level, succinct feature representations from unlabeled data.
Demonstrates that these learned features significantly improve classification performance when used with standard supervised algorithms like SVMs, across various modalities (images, audio, text).
Shows how a Fisher kernel can be learned for the sparse coding representation, further improving classification performance, particularly on handwritten character recognition.
Achieves competitive results on tasks such as Caltech 101 image classification, outperforming previous baselines and PCA-based methods.

Abstract

We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.

Citation Graph

Loading graph...

References [23]

Sort:

Filter:

[1]Reducing the Dimensionality of Data With Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

I didn't like the way this is written, very hard to understand without a ton of background knowledge. But hey, it's the first deep learning model!

[2]Multitask Learning

Rich Caruana - 1997

13 papers in library cite

I expected waaaaaay more from this paper. The idea is sooooo simple and the results are underwhelming. Also, 30 pages for something that could be said in 10. The writing style is a bit boring. TBH it seems like it's just a re-writing of Caruana's PhD thesis.

[3]Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce - 2006

14 papers in library cite

It's a fun read, but in the end is just an application of the spatial pyramid matching kernel from the other paper.

[4]Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images

B. Olshausen, D. Field - 1996

5 papers in library cite

I can see the contribution, but overall the paper was uninteresting

[5]Learning Generative Visual Models From Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

Li Fei Fei, Rob Fergus, Pietro Perona - 2004

15 papers in library cite

I think most people cite this thinking this is where the Caltech 101 dataset comes from (it's not). Anyway, it's just an extension of the other dataset and it's very mathy, not NNs, and uninteresting.

[6]A Framework for Learning Predictive Structures From Multiple Tasks and Unlabeled Data

Rie Kubota Ando, Tong Zhang - 2005

10 papers in library cite

Very nice and clever way of solving the problem of semi-supervised learning, and makes a lot of sense. I give them more credit for formalizing the concept. The methodology is a bit boring.

[7]Is Learning the N-Th Thing Any Easier Than Learning the First?

Sebastian Thrun - 1996

3 papers in library cite

It's one of those early NN papers that only have toy examples and some very non-standard terminology. TBH doesn't add much.

Reference title contains 'et al'

[8]Latent Dirichlet Allocation

D. M. Blei, Andrew Y. Ng, Michael I. Jordan - 2003

10 papers in library cite

30 pages; LDA

[9]Indexing by Latent Semantic Analysis

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. Harshman - 1990

12 papers in library cite

LSA paper

[10]A Global Geometric Framework for Nonlinear Dimensionality Reduction

J. Tenenbaum, V. D. Silva, John Langford - 2000

7 papers in library cite

[11]Object Recognition With Features Inspired by Visual Cortex

T. Serre, Lior Wolf, T. Poggio - 2005

7 papers in library cite

[12]Efficient Sparse Coding Algorithms

Honglak Lee, Alexis Battle, Rajat Raina, A. Ng - 2007

6 papers in library cite

[13]Svm-knn: Discriminative Nearest Neighbor Classification for Visual Category Recognition

Haowei Zhang, A. C. Berg, M. Maire, Jitendra Malik - 2006

6 papers in library cite

[14]Nonlinear Dimensionality Reduction by Locally Linear Embedding

S. T. Roweis, L. K. Saul - 2000

5 papers in library cite

[15]Regression Shrinkage and Selection via the Lasso

R. Tibshirani - 1996

4 papers in library cite

[16]Text Classification From Labeled and Unlabeled Documents Using Em

K. Nigam, A. K. Mccallum, Sebastian Thrun, T. Mitchell - 2000

4 papers in library cite

[17]Exploiting Generative Models in Discriminative Classifiers

T. Jaakkola, D. Haussler - 1999

3 papers in library cite

[18]Combining Generative Models and Fisher Kernels for Object Class Recognition

Alex Holub, M. Welling, Pietro Perona - 2005

2 papers in library cite

[19]Feature Selection, 11 vs. 12 Regularization, and Rotational Invariance

A. Ng, A. Y. - 2004

2 papers in library cite

[20]Least Angle Regression

B. Efron, T. Hastie, I. Johnstone, R. Tibshirani - 2004

1 paper in library cites

[21]Marginalized Kernels for Biological Sequences

K. Tsuda, T. Kin, K. Asai - 2002

1 paper in library cites

[22]Non-Negative Matrix Factorization With Sparseness Constraints

P. O. Hoyer - 2004

1 paper in library cites

[23]Theoretical Models of Learning to Learn

J. Baxter - 1997

1 paper in library cites

Cited by

7

papers in your library

Cites

9

papers in your library

Read

on January 30, 2026

It's nice to make the distinction of self-taught learning vs. semi-supervised learning. However, it doesn't talk about NNs. Overall, LeCun's CNN unsupervised learning was better.

Tags

Paper Aliases

No aliases