2003

Video google: A Text Retrieval Approach to Object Matching in Videos

Josef Sivic, Andrew Zisserman

citations

Cite Score

82

AI summary

This paper proposes a text retrieval approach to object matching in videos, representing objects by viewpoint-invariant region descriptors, using vector quantization and inverted file systems for efficient retrieval, and demonstrating the method on two full-length feature films.

Main Contributions

  • Presents an approach for object and scene retrieval in videos, localizing user-outlined objects.
  • Utilizes viewpoint invariant region descriptors and temporal continuity within video shots to track regions, rejecting unstable ones and reducing noise.
  • Implements a text retrieval analogy by pre-computing matches using vector quantization and inverted file systems for immediate retrieval and ranked results.
  • Introduces two types of viewpoint covariant regions, Shape Adapted (SA) and Maximally Stable (MS), clustered separately to form different visual vocabularies.
  • Evaluates the method for scene matching and object retrieval on feature films like 'Run Lola Run' and 'Groundhog Day', demonstrating high accuracy and efficiency.

Abstract

We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieval is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching on two full length feature films.

Citation Graph

Loading graph...

References [18]

Sort:
Filter:

D. G. Lowe - 1999

6 papers in library cite

Cordelia Schmid, R. Mohr - 1997

3 papers in library cite

K. Mikolajczyk, Cordelia Schmid - 2002

2 papers in library cite

J. Matas, O. Chum, M. Urban, T. Pajdla - 2002

2 papers in library cite

S. Brin, L. Page - 1998

2 papers in library cite

K. Mikolajczyk, Cordelia Schmid - 2003

1 paper in library cites

F. Schaffalitzky, Andrew Zisserman - 2002

1 paper in library cites

D. Tell, S. Carlsson - 2002

1 paper in library cites

D. M. Squire, W. Muller, H. Muller, T. Pun - 2000

1 paper in library cites

D. Lowe - 2001

1 paper in library cites

I. H. Witten, A. Moffat, T. Bell - 1999

1 paper in library cites

R. B. Yates, B. R. Neto - 1999

1 paper in library cites

F. Schaffalitzky, Andrew Zisserman - 2002

1 paper in library cites

S. Obdrzalek, J. Matas - 2002

1 paper in library cites

A. Baumberg - 2000

1 paper in library cites

H. Muller, S. M. Maillet, T. Pun - 2002

1 paper in library cites

T. Tuytelaars, Luc Van Gool - 2000

1 paper in library cites

Cited by

5

papers in your library

Cites

0

papers in your library

Read

on January 22, 2026

Your review

Tags

Paper Aliases

No aliases