2017

Adversarial Examples for Evaluating Reading Comprehension Systems

R. Jia, Percy Liang

citations

Cite Score

54

AI summary

This paper introduces an adversarial evaluation method for the Stanford Question Answering Dataset (SQuAD). It demonstrates that existing reading comprehension models are vulnerable to adversarial examples, where distracting sentences are added to the input paragraph. The accuracy of sixteen published models drops significantly in this adversarial setting.

Main Contributions

  • Proposes an adversarial evaluation scheme for SQuAD.
  • Demonstrates that existing reading comprehension models are vulnerable to adversarial examples.
  • Shows that adding grammatical adversarial sentences reduces F1 score from 75% to 36% across sixteen models.
  • Finds that adding non-grammatical sequences of English words causes average F1 score to drop further to 7% on a smaller set of models.
  • Releases code and data publicly to encourage the development of new models that understand language more precisely.

Abstract

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of 75% F1 score to 36%; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%. We hope our insights will motivate the development of new models that understand language more precisely.

Citation Graph

Loading graph...

References [35]

Sort:
Filter:

Ian J. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, Aaron Courville, Yoshua Bengio - 2014

2 papers in library cite

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

Ian J. Goodfellow, J. Shlens, Christian Szegedy - 2015

4 papers in library cite

Rob Fergus - 2014

7 papers in library cite

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

M. Seo, A. Kembhavi, Ali Farhadi, Hananneh Hajishirzi - 2017

13 papers in library cite

D. Paperno, German Kruszewski, A. Lazaridou, N. Q. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, Raquel Fernandez - 2016

12 papers in library cite

Hector J. Levesque - 2013

2 papers in library cite

C. Fellbaum - 1998

12 papers in library cite

R. Jozefowicz, Samy Bengio - 2016

1 paper in library cites

Jeffrey Li, W. Monroe, T. Shi, A. Ritter, Dan Jurafsky - 2017

1 paper in library cites

Shijie Wang, J. J. Jiang - 2017

6 papers in library cite

Dirk Weissenborn, G. Wiese, L. Seiffe - 2017

4 papers in library cite

Christopher D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, D. Mcclosky - 2014

6 papers in library cite

K. Lee, S. Salant, T. Kwiatkowski, A. P. Parikh, Dipanjan Das, Jonathan Berant - 2017

3 papers in library cite

Y. Shen, P. Huang, Jianfeng Gao, Weizhu Chen - 2017

3 papers in library cite

W. N. Francis, H. Kucera - 1979

2 papers in library cite

A. Globerson, S. Roweis - 2006

2 papers in library cite

N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma - 2004

1 paper in library cites

Noah A. Smith - 2012

1 paper in library cites

D. Lowd, C. Meek - 2005

1 paper in library cites

E. M. Bender, H. D. Iii, A. Ettinger, H. Kannan, S. Rao, E. Rothschild - 2017

1 paper in library cites

R. Jia, Percy Liang - 2016

1 paper in library cites

Y. Yu, Wenxuan Zhang, K. Hasan, M. Yu, Bing Xiang, B. Zhou - 2016

1 paper in library cites

J. Zhang, X. Zhu, Qinlang Chen, L. Dai, S. Wei, H. Jiang - 2017

1 paper in library cites

N. Madnani, B. J. Dorr - 2010

1 paper in library cites

M. Hu, Y. Peng, X. Qiu - 2017

1 paper in library cites

Zhengtao Wang, H. Mi, W. Hamza, R. Florian - 2016

1 paper in library cites

N. Papernot, P. Mcdaniel, I. Goodfellow, S. Jha, Z. Celik, A. Swami - 2017

1 paper in library cites

Y. Gong, Samuel R. Bowman - 2017

1 paper in library cites

N. Narodytska, S. P. Kasiviswanathan - 2016

1 paper in library cites

Rosanne Liu, Jiaxi Hu, W. Wei, Zhilin Yang, E. Nyberg - 2017

1 paper in library cites

M. Marcus, B. Santorini, Mary Ann Marcinkiewicz, A. Taylor - 1999

1 paper in library cites

L. Rimell, S. Clark, M. Steedman - 2009

1 paper in library cites

S. M. Dezfooli, A. Fawzi, O. Fawzi, P. Frossard - 2017

1 paper in library cites

Cited by

11

papers in your library

Cites

13

papers in your library

Read

on November 4, 2025

Your review

Tags

Paper Aliases

No aliases