2014

Large-Scale Video Classification With Convolutional Neural Networks

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, Li Fei Fei

citations

Cite Score

82

AI summary

This paper introduces a large-scale video classification approach using CNNs on the Sports-1M dataset with 1 million YouTube videos across 487 classes, and proposes a multiresolution foveated architecture to speed up training, achieving significant performance gains and demonstrating strong generalization capabilities on the UCF-101 dataset.

Main Contributions

  • Extensive evaluation of CNNs for video classification on a large-scale dataset (Sports-1M) with 1 million videos across 487 categories.
  • Introduction of the Sports-1M dataset, a new large-scale video dataset for sports classification.
  • A multiresolution architecture that processes input at two spatial resolutions, improving runtime performance without sacrificing accuracy.
  • Demonstration of significant performance improvements over feature-based baselines on the Sports-1M dataset.
  • Significant improvement on the UCF-101 dataset through transfer learning, achieving state-of-the-art results compared to existing baselines.

Abstract

Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

Citation Graph

Loading graph...

References [28]

Sort:
Filter:

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

J. Deng, W. Dong, Richard Socher, L. J. Li, K. Li, Li Fei Fei - 2009

28 papers in library cite

Yann Lecun, Leon Bottou, Yoshua Bengio, Patrick Haffner - 1998

62 papers in library cite

Ross Girshick, J. Donahue, Trevor Darrell, Jitendra Malik - 2014

18 papers in library cite

Matthew D. Zeiler, Rob Fergus - 2014

15 papers in library cite

Josef Sivic, Andrew Zisserman - 2003

5 papers in library cite

Jeffrey Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Quoc V. Le, Mark Z. Mao, Marc'aurelio Ranzato, A. Senior, P. Tucker, K. Yang, Andrew Y. Ng - 2012

16 papers in library cite

P. Sermanet, S. Chintala, Yann Lecun - 2012

6 papers in library cite

Khurram Soomro, Amir Roshan Zamir, Mubarak Shah - 2012

1 paper in library cites

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, Rob Fergus, Yann Lecun - 2014

16 papers in library cite

A. Razavian, H. Azizpour, J. Sullivan, S. Carlsson - 2014

6 papers in library cite

Clement Farabet, C. Couprie, L. Najman, Yann Lecun - 2013

6 papers in library cite

N. Dalal, B. Triggs - 2005

12 papers in library cite

Quoc Le, W. Zou, S. Y. Yeung, A. Ng - 2011

4 papers in library cite

Graham W. Taylor, Rob Fergus, Yann Lecun, C. Bregler - 2010

3 papers in library cite

Dan C. Ciresan, A. Giusti, Jürgen Schmidhuber - 2012

3 papers in library cite

J. C. Niebles, C. W. Chen, Li Fei Fei - 2010

2 papers in library cite

Joseph Liu, J. Luo, Mubarak Shah - 2009

2 papers in library cite

S. Ji, Weixin Xu, Michael Yang, K. Yu - 2013

1 paper in library cites

M. Varma, Andrew Zisserman - 2005

1 paper in library cites

Haiming Wang, A. Klaser, Cordelia Schmid, C. L. Liu - 2011

1 paper in library cites

Piotr Dollar, V. Rabaud, G. Cottrell, S. Belongie - 2005

1 paper in library cites

W. Yang, G. Toderici - 2011

1 paper in library cites

Haiming Wang, M. M. Ullah, A. Klaser, I. Laptev, Cordelia Schmid - 2009

1 paper in library cites

C. Couprie, Clement Farabet, L. Najman, Yann Lecun - 2013

1 paper in library cites

I. Laptev, M. Marszalek, Cordelia Schmid, B. Rozenfeld - 2008

1 paper in library cites

I. Laptev - 2005

1 paper in library cites

M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, A. Baskurt - 2011

1 paper in library cites

Cited by

2

papers in your library

Cites

12

papers in your library

Read

on August 3, 2025

Your review

Tags

Paper Aliases

No aliases