2019

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

R. T. Mccoy, Ellie Pavlick, Tal Linzen

citations

Cite Score

47

AI summary

This paper introduces the HANS dataset to evaluate syntactic heuristics in NLI models, finding that models like BERT rely on fallible heuristics, leading to poor performance on HANS; augmenting training data improves performance, indicating room for progress in NLI systems by addressing these biases.

Main Contributions

  • Introduces the HANS dataset, an NLI evaluation set designed to test specific hypotheses about invalid heuristics.
  • Demonstrates shortcomings in state-of-the-art models trained on MNLI using the HANS dataset.
  • Shows that these shortcomings can be mitigated by augmenting a model's training set with HANS-like examples.
  • Finds that four existing NLI models perform very poorly on HANS, suggesting that their high accuracies on NLI test sets may be due to the exploitation of invalid heuristics.
  • Shows models performed significantly better on both HANS and on a separate structure-dependent dataset when their training data was augmented with HANS-like examples.

Abstract

A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area.

Citation Graph

Loading graph...

References [46]

Sort:
Filter:

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015

25 papers in library cite

A. Williams, Nikita Nangia, S. Bowman - 2018

19 papers in library cite

Ido Dagan, O. Glickman, Bernardo Magnini - 2005

19 papers in library cite

A. P. Parikh, O. Tackstrom, Dipanjan Das, Jakob Uszkoreit - 2016

11 papers in library cite

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Richard Schwartz, S. Bowman, Noah A. Smith - 2018

6 papers in library cite

Qinlang Chen, X. Zhu, Z. H. Ling, S. Wei, H. Jiang, D. Inkpen - 2017

5 papers in library cite

Matt Gardner, J. Grus, M. Neumann, Oyvind Tafjord, P. Dasigi, N. Liu, M. Peters, M. Schmitz, Luke Zettlemoyer - 2018

5 papers in library cite

Alexis Conneau, German Kruszewski, G. Lample, L. Barrault, M. Baroni - 2018

2 papers in library cite

A. Poliak, J. Naradowsky, A. Haldar, R. Rudinger, B. V. Durme - 2018

5 papers in library cite

A. Naik, A. Ravichander, N. M. Sadeh, C. P. Rose, Graham Neubig - 2018

4 papers in library cite

Nikita Nangia, Samuel R. Bowman - 2019

3 papers in library cite

M. Glockner, V. Shwartz, Y. Goldberg - 2018

3 papers in library cite

A. Agrawal, D. Batra, D. Parikh - 2016

2 papers in library cite

Dan Klein, Christopher D. Manning - 2003

7 papers in library cite

S. Bowman, J. Gauthier, Abhinav Rastogi, R. Gupta, C. Manning, Christopher Potts - 2016

5 papers in library cite

Tal Linzen, E. Dupoux, Y. Goldberg - 2016

5 papers in library cite

C. Condoravdi, D. Crouch, V. D. Paiva, R. Stolle, D. G. Bobrow - 2003

5 papers in library cite

Y. Goldberg - 2019

4 papers in library cite

A. Poliak, A. Haldar, R. Rudinger, J. E. Hu, Ellie Pavlick, A. S. White, B. V. Durme - 2018

4 papers in library cite

Y. Adi, E. Kermany, Yonatan Belinkov, O. Lavi, Y. Goldberg - 2016

4 papers in library cite

K. Gulordava, Piotr Bojanowski, E. Grave, Tal Linzen, M. Baroni - 2018

3 papers in library cite

R. T. Mccoy, Tal Linzen - 2019

3 papers in library cite

R. Marvin, Tal Linzen - 2018

3 papers in library cite

I. Dasgupta, Daniel Guo, Andreas Stuhlmuller, S. J. Gershman, N. D. Goodman - 2018

2 papers in library cite

A. White, P. Rastogi, K. Duh, B. Durme - 2017

2 papers in library cite

A. S. White, R. Rudinger, K. Rawlins, B. V. Durme - 2018

2 papers in library cite

R. T. Mccoy, Robert Frank, Tal Linzen - 2018

2 papers in library cite

Y. Nie, Yuzhi Wang, Mohit Bansal - 2018

1 paper in library cites

A. Ettinger, A. Elgohary, C. Phillips, P. Resnik - 2018

1 paper in library cites

I. Sanchez, J. Mitchell, Sebastian Riedel - 2018

1 paper in library cites

L. Rimell, S. Clark - 2010

1 paper in library cites

A. Williams, A. Drozdov, Samuel R. Bowman - 2018

1 paper in library cites

W. Tabor, B. Galantucci, D. Richardson - 2004

1 paper in library cites

Missing year

B. Maccartney, Christopher D. Manning

1 paper in library cites

R. Rudinger, A. S. White, B. V. Durme - 2018

1 paper in library cites

R. T. Mccoy, Tal Linzen, E. Dunbar, P. Smolensky - 2019

1 paper in library cites

A. Geiger, I. Cases, L. Karttunen, Christopher Potts - 2018

1 paper in library cites

Y. Mehdad, A. Moschitti, F. M. Zanzotto - 2010

1 paper in library cites

Jeremy Kim, C. Malon, A. Kadav - 2018

1 paper in library cites

Ellie Pavlick, Chris Callison Burch - 2016

1 paper in library cites

T. G. Bever - 1970

1 paper in library cites

N. Weber, L. Shekhar, N. Balasubramanian - 2018

1 paper in library cites

K. Christianson, A. Hollingworth, J. F. Halliwell, F. Ferreira - 2001

1 paper in library cites

J. Wang, Zhengyou Zhang, C. Xie, Y. Zhou, V. Premachandran, Jiacheng Zhu, L. Xie, A. Yuille - 2018

1 paper in library cites

Cited by

5

papers in your library

Cites

14

papers in your library

Read

on December 29, 2025

Your review

Tags

Paper Aliases

No aliases