2018

Think You Have Solved Question Answering? Try arc, the Ai2 Reasoning Challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord

citations

Cite Score

73

AI summary

This paper introduces the AI2 Reasoning Challenge (ARC) dataset, comprising 7,787 grade-school science questions, a 14M sentence science corpus, and three neural baselines (DecompAttn, BiDAF, DGEM) to foster research in advanced question answering, demonstrating that current models struggle on a "Challenge Set" designed to require deeper reasoning.

Main Contributions

  • Introduction of the AI2 Reasoning Challenge (ARC) dataset, consisting of 7,787 natural, grade-school science questions.
  • Creation of a Challenge Set (2,590 questions) designed to be difficult for simple retrieval and co-occurrence algorithms, and an Easy Set (5,197 questions).
  • Release of the ARC Corpus, a 14M science sentence corpus, to aid in addressing the challenge.
  • Adaptation and testing of three neural baseline models (DecompAttn, BiDAF, DGEM) on ARC.
  • Demonstration that current state-of-the-art neural models fail to significantly outperform a random baseline on the Challenge Set, highlighting the need for advanced QA methods.

Abstract

We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. Together, these constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQUAD or SNLI. The ARC question set is partitioned into a Challenge Set and an Easy Set, where the Challenge Set contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurence algorithm. The dataset contains only natural, grade-school science questions (authored for human tests), and is the largest public-domain set of this kind (7,787 questions). We test several baselines on the Challenge Set, including leading neural models from the SQUAD and SNLI tasks, and find that none are able to significantly outperform a random baseline, reflecting the difficult nature of this task. We are also releasing the ARC Corpus, a corpus of 14M science sentences relevant to the task, and implementations of the three neural baseline models tested. Can your model perform better? We pose ARC as a challenge to the community.

Citation Graph

Loading graph...

References [31]

Sort:
Filter:

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

M. Joshi, E. Choi, D. Weld, Luke Zettlemoyer - 2017

18 papers in library cite

Jason Weston, S. Chopra, Antoine Bordes - 2015

18 papers in library cite

M. Seo, A. Kembhavi, Ali Farhadi, Hananneh Hajishirzi - 2017

13 papers in library cite

R. Jia, Percy Liang - 2017

11 papers in library cite

A. P. Parikh, O. Tackstrom, Dipanjan Das, Jakob Uszkoreit - 2016

11 papers in library cite

Jason Weston, Antoine Bordes, S. Chopra, Tomas Mikolov - 2015

11 papers in library cite

Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Richard Schwartz, S. Bowman, Noah A. Smith - 2018

6 papers in library cite

M. Richardson, C. J. C. Burges, Erin Renshaw - 2013

16 papers in library cite

A. Trischler, Tianle Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, K. Suleman - 2017

6 papers in library cite

J. Welbl, N. F. Liu, Matt Gardner - 2017

3 papers in library cite

J. Welbl, P. Stenetorp, Sebastian Riedel - 2018

2 papers in library cite

M. J. Seo, Hananneh Hajishirzi, Ali Farhadi, Oren Etzioni - 2014

2 papers in library cite

Peter Clark, Oren Etzioni - 2016

2 papers in library cite

Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni, Dan Roth - 2016

2 papers in library cite

Tushar Khot, Ashish Sabharwal, Peter Clark - 2018

2 papers in library cite

M. Henaff, Jason Weston, A. Szlam, Antoine Bordes, Yann Lecun - 2016

2 papers in library cite

T. Simonite - 2018

1 paper in library cites

Tushar Khot, Ashish Sabharwal, Peter Clark - 2017

1 paper in library cites

A. Kembhavi, M. Seo, D. Schwenk, J. Choi, Ali Farhadi, Hananneh Hajishirzi - 2017

1 paper in library cites

E. Strickland - 2013

1 paper in library cites

Peter Clark, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, P. D. Turney, Daniel Khashabi - 2016

1 paper in library cites

Nii - 2017

1 paper in library cites

E. Davis - 2016

1 paper in library cites

Carissa Schoenick, Peter Clark, Oyvind Tafjord, P. Turney, Oren Etzioni - 2017

1 paper in library cites

A. Fujita, A. Kameda, A. Kawazoe, Y. Miyao - 2014

1 paper in library cites

M. Seo, S. Min, Ali Farhadi, Hananneh Hajishirzi - 2017

1 paper in library cites

Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Dan Roth - 2018

1 paper in library cites

R. Brachman - 2005

1 paper in library cites

K. W. Church, P. Hanks - 1989

1 paper in library cites

Cited by

5

papers in your library

Cites

10

papers in your library

Read

on May 23, 2026

Your review

Tags

Paper Aliases

No aliases