Papperoni

2007

Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification

John Blitzer, Mark Dredze, Fernando Pereira

citations

Cite Score

AI summary

This paper extends the Structural Correspondence Learning (SCL) algorithm for sentiment classification, reducing relative error by 30-46%, and introduces an A-distance measure for domain similarity, aiding in selecting source domains for improved classifier transferability across product review datasets from Amazon (books, DVDs, electronics, kitchen appliances).

Main Contributions

Extended the Structural Correspondence Learning (SCL) algorithm for sentiment classification.
Proposed a new pivot selection method for SCL based on mutual information with source labels (SCL-MI), improving adaptation performance.
Introduced a method to correct feature misalignments using a small amount of labeled target domain data, achieving a 46% average relative reduction in error.
Identified and evaluated the A-distance as a measure of domain similarity that correlates with adaptation loss, which can be used to select optimal source domains for annotation.
Constructed and utilized a new dataset of Amazon product reviews across four different product types (books, DVDs, electronics, and kitchen appliances) for sentiment domain adaptation.

Abstract

Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is impractical. We investigate domain adaptation for sentiment classifiers, focusing on online reviews for different types of products. First, we extend to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline. Second, we identify a measure of domain similarity that correlates well with the potential for adaptation of a classifier from one domain to another. This measure could for instance be used to select a small set of domains to annotate whose trained classifiers would transfer well to many other domains.

Citation Graph

Loading graph...

References [13]

Sort:

Filter:

[1]A Framework for Learning Predictive Structures From Multiple Tasks and Unlabeled Data

Rie Kubota Ando, Tong Zhang - 2005

10 papers in library cite

Google Scholar

Very nice and clever way of solving the problem of semi-supervised learning, and makes a lot of sense. I give them more credit for formalizing the concept. The methodology is a bit boring.

[2]Seeing Stars: Exploiting Class Relationships for Sentiment Categorization With Respect to Rating Scales

Bo Pang, L. Lee - 2005

13 papers in library cite

Google Scholar

[3]Domain Adaptation With Structural Correspondence Learning

John Blitzer, R. Mcdonald, Fernando Pereira - 2006

4 papers in library cite

Google Scholar

[4]Thumbs Up? Sentiment Classification Using Machine Learning Techniques

Bo Pang, L. Lee, S. Vaithyanathan - 2002

4 papers in library cite

Google Scholar

[5]Analysis of Representations for Domain Adaptation

S. B. David, John Blitzer, K. Crammer, Fernando Pereira - 2006

3 papers in library cite

Google Scholar

[6]Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

P. D. Turney - 2002

3 papers in library cite

Google Scholar

[7]Get Out the Vote: Determining Support or Opposition From Congressional Floor-Debate Transcripts

M. Thomas, Bo Pang, L. Lee - 2006

2 papers in library cite

Google Scholar

[8]Seeing Stars When There Aren't Many Stars: Graph-Based Semi-Supervised Learning for Sentiment Categorization

A. B. Goldberg, Jiacheng Zhu - 2006

2 papers in library cite

Google Scholar

[9]Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms

Tong Zhang - 2004

2 papers in library cite

Google Scholar

[10]Yahoo! For Amazon: Extracting Market Sentiment From Stock Message Boards

S. Das, Mark Chen - 2001

2 papers in library cite

Google Scholar

[11]A Statistical Model for Multilingual Entity Detection and Tracking

R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, S. Roukos - 2004

1 paper in library cites

Google Scholar

[12]Adaptation of maximum entropy Capitalizer: Little Data Can Help a Lot

C. Chelba, Alex Acero - 2004

1 paper in library cites

Google Scholar

[13]Customizing Sentiment Classifiers to New Domains: A Case Study

A. Aue, M. Gamon - 2005

1 paper in library cites

Google Scholar

Cited by

papers in your library

Cites

papers in your library

Read

on January 26, 2026

It's alright. I think they focus a lot on methodology of prediction but most citations are actually because of the dataset.