2005
Cite Score
57
AI summary
This paper introduces the Microsoft Research Paraphrase Corpus (MSRP), a dataset of 5801 hand-labeled sentence pairs, created using heuristic extraction and an SVM classifier, to address the lack of large-scale, publicly available paraphrase corpora.
Main Contributions
Abstract
An obstacle to research in automatic paraphrase identification and generation is the lack of large-scale, publicly-available labeled corpora of sentential paraphrases. This paper describes the creation of the recently-released Microsoft Research Paraphrase Corpus, which contains 5801 sentence pairs, each hand-labeled with a binary judgment as to whether the pair constitutes a paraphrase. The corpus was created using heuristic extraction techniques in conjunction with an SVM-based classifier to select likely sentence-level paraphrases from a large corpus of topic-clustered news data. These pairs were then submitted to human judges, who confirmed that 67% were in fact semantically equivalent. In addition to describing the corpus itself, we explore a number of issues that arose in defining guidelines for the human raters.
Citation Graph
References [30]
C. Fellbaum - 1998
12 papers in library cite
B. Dolan, C. Quirk, C. A. Brockett, C. Chris - 2004
5 papers in library cite
V. Vapnik - 1995
9 papers in library cite
P. F. Brown, S. D. Pietra, Vincent J. Della Pietra, R. L. Mercer - 1993
7 papers in library cite
F. J. Och, Hermann Ney - 2003
3 papers in library cite
R. Barzilay, L. Lee - 2003
2 papers in library cite
C. Corley, R. Mihalcea - 2005
2 papers in library cite
T. Chklovski - 2005
1 paper in library cites
V. Levenshtein - 1966
1 paper in library cites
Chris Brockett, William B. Dolan - 2005
1 paper in library cites
K. Rooney - 2001
1 paper in library cites
R. Barzilay, K. R. Mckeown - 2001
1 paper in library cites
John C. Platt - 1999
1 paper in library cites
J. Burger, L. Ferro - 2005
1 paper in library cites
F. J. Och, Hermann Ney - 2000
1 paper in library cites
H. D. Iii, D. Marcu
1 paper in library cites
S. Dumais, J. Platt, D. Heckerman, M. Sahami - 1998
1 paper in library cites
T. Joachims - 2002
1 paper in library cites
C. Quirk, Chris Brockett, William B. Dolan - 2004
1 paper in library cites
Pascale Fung, P. Cheung - 2004
1 paper in library cites
S. Huang, D. Graff, G. Doddington - 2002
1 paper in library cites
A. Finch, T. Watanabe, Y. Akiba, E. Sumita - 2004
1 paper in library cites
Y. Z. Zhang, K. Yamamoto - 2002
1 paper in library cites
D. Wu - 2005
1 paper in library cites
Chris Brockett, William B. Dolan - 2005
1 paper in library cites
Bo Pang, K. Knight, D. Marcu - 2003
1 paper in library cites
J. Weeds, D. Weir, B. Keller - 2005
1 paper in library cites
S. Shirai, K. Yamamoto, F. Bond, H. Tanaka - 2002
1 paper in library cites
S. Dumais - 1998
1 paper in library cites
Cited by
9
papers in your library
Cites
1
papers in your library
Read
on February 1, 2026
Your review
Tags
Paper Aliases
No aliases