Papperoni

2017

Adversarial Examples for Evaluating Reading Comprehension Systems

R. Jia, Percy Liang

Open PDF Google Scholar

citations

Cite Score

54

AI summary

This paper introduces an adversarial evaluation method for the Stanford Question Answering Dataset (SQuAD). It demonstrates that existing reading comprehension models are vulnerable to adversarial examples, where distracting sentences are added to the input paragraph. The accuracy of sixteen published models drops significantly in this adversarial setting.

Main Contributions

Proposes an adversarial evaluation scheme for SQuAD.
Demonstrates that existing reading comprehension models are vulnerable to adversarial examples.
Shows that adding grammatical adversarial sentences reduces F1 score from 75% to 36% across sixteen models.
Finds that adding non-grammatical sequences of English words causes average F1 score to drop further to 7% on a smaller set of models.
Releases code and data publicly to encourage the development of new models that understand language more precisely.

Abstract

Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of 75% F1 score to 36%; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%. We hope our insights will motivate the development of new models that understand language more precisely.

Citation Graph

Loading graph...

References [35]

Sort:

Filter:

[1]Generative Adversarial Nets

Ian J. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, Aaron Courville, Yoshua Bengio - 2014

2 papers in library cite

The idea is surprisingly simple. I was a bit afraid of the math, but it's very simple. There's proof that this converges and finds an optimum. I just wish they explained more, or gave better results. It just seems that it finishes and I'm still hungry.

[2]GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

Not a bad paper, I just don't like the motivation and I think the methodology is poorly explained and hard to follow. I can't deny the good results though...

[3]Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, J. Shlens, Christian Szegedy - 2015

4 papers in library cite

It feels that it is an extension of the previous paper, which is nice. It is less original but also easier to read!

[4]Intriguing Properties of Neural Networks

Rob Fergus - 2014

7 papers in library cite

Very nice, and the first to notice how flaky NNs are. I think the end they went overboard with math, but the rest of the paper is very good.

[5]SQuAD: 100,000+ Questions for Machine Comprehension of Text

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Nice paper that introduced an important dataset. Not much else though.

[6]Bidirectional Attention Flow for Machine Comprehension

M. Seo, A. Kembhavi, Ali Farhadi, Hananneh Hajishirzi - 2017

13 papers in library cite

It's alright but the method seems absurdly complex. Maybe I am a bit biased because it's like the 20th paper that I read with attention + LSTMs...

[7]The LAMBADA dataset: Word Prediction Requiring a Broad Discourse Context

D. Paperno, German Kruszewski, A. Lazaridou, N. Q. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, Raquel Fernandez - 2016

12 papers in library cite

Very nice paper - very interesting methodology to building it and very good when they bring a dataset that is meant to make machines to fail

[8]On Our Best Behaviour

Hector J. Levesque - 2013

2 papers in library cite

I loved this paper. So relevant despite being 10+ years old! Everyone that works with AI should read this.

[9]WordNet: An Electronic Lexical Database

C. Fellbaum - 1998

12 papers in library cite

It's huge and I don't think it will add much (it is a book)

[10]Generating Sentences From a Continuous Space

R. Jozefowicz, Samy Bengio - 2016

1 paper in library cites

Not sure why I marked this, but it's sammy bengio

[11]Adversarial Learning for Neural Dialogue Generation

Jeffrey Li, W. Monroe, T. Shi, A. Ritter, Dan Jurafsky - 2017

1 paper in library cites

Adversarial conversation

[12]Machine Comprehension Using Match-LSTM and Answer Pointer

Shijie Wang, J. J. Jiang - 2017

6 papers in library cite

Answer pointer

[13]Making Neural qa as Simple as Possible but Not Simpler

Dirk Weissenborn, G. Wiese, L. Seiffe - 2017

4 papers in library cite

I love the title

[14]The stanford coreNLP Natural Language Processing Toolkit

Christopher D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, D. Mcclosky - 2014

6 papers in library cite

[15]Learning Recurrent Span Representations for Extractive Question Answering

K. Lee, S. Salant, T. Kwiatkowski, A. P. Parikh, Dipanjan Das, Jonathan Berant - 2017

3 papers in library cite

[16]Reasonet: Learning to Stop Reading in Machine Comprehension

Y. Shen, P. Huang, Jianfeng Gao, Weizhu Chen - 2017

3 papers in library cite

[17]Brown Corpus Manual

W. N. Francis, H. Kucera - 1979

2 papers in library cite

[18]Nightmare at Test Time: Robust Learning by Feature Deletion

A. Globerson, S. Roweis - 2006

2 papers in library cite

[19]Adversarial Classification

N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma - 2004

1 paper in library cites

[20]Adversarial Evaluation for Models of Natural Language

Noah A. Smith - 2012

1 paper in library cites

[21]Adversarial Learning

D. Lowd, C. Meek - 2005

1 paper in library cites

[22]Build It, Break It: The Language Edition

E. M. Bender, H. D. Iii, A. Ettinger, H. Kannan, S. Rao, E. Rothschild - 2017

1 paper in library cites

[23]Data Recombination for Neural Semantic Parsing

R. Jia, Percy Liang - 2016

1 paper in library cites

[24]End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension

Y. Yu, Wenxuan Zhang, K. Hasan, M. Yu, Bing Xiang, B. Zhou - 2016

1 paper in library cites

[25]Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering

J. Zhang, X. Zhu, Qinlang Chen, L. Dai, S. Wei, H. Jiang - 2017

1 paper in library cites

[26]Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods

N. Madnani, B. J. Dorr - 2010

1 paper in library cites

[27]Mnemonic Reader for Machine Comprehension

M. Hu, Y. Peng, X. Qiu - 2017

1 paper in library cites

[28]Multi-Perspective Context Matching for Machine Comprehension

Zhengtao Wang, H. Mi, W. Hamza, R. Florian - 2016

1 paper in library cites

[29]Practical Black-Box Attacks Against Deep Learning Systems Using Adversarial Examples

N. Papernot, P. Mcdaniel, I. Goodfellow, S. Jha, Z. Celik, A. Swami - 2017

1 paper in library cites

[30]Ruminating Reader: Reasoning With Gated Multi-Hop Attention

Y. Gong, Samuel R. Bowman - 2017

1 paper in library cites

[31]Simple Black-Box Adversarial Perturbations for Deep Networks

N. Narodytska, S. P. Kasiviswanathan - 2016

1 paper in library cites

[32]Structural Embedding of Syntactic Trees for Machine Comprehension

Rosanne Liu, Jiaxi Hu, W. Wei, Zhilin Yang, E. Nyberg - 2017

1 paper in library cites

M. Marcus, B. Santorini, Mary Ann Marcinkiewicz, A. Taylor - 1999

1 paper in library cites

[34]Unbounded Dependency Recovery for Parser Evaluation

L. Rimell, S. Clark, M. Steedman - 2009

1 paper in library cites

[35]Universal Adversarial Perturbations

S. M. Dezfooli, A. Fawzi, O. Fawzi, P. Frossard - 2017

1 paper in library cites

Cited by

11

papers in your library

Cites

13

papers in your library

Read

on November 4, 2025

I liked it a lot! It's good to see people testing things rather than just trying to beat SotA!

Tags

Paper Aliases

No aliases