2021
Cite Score
14
AI summary
This paper argues that current NLU benchmarks are broken and proposes four criteria (validity, reliable annotation, statistical power, and disincentives for biased models) that future benchmarks should satisfy to facilitate progress in language understanding.
Main Contributions
Abstract
Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and biased systems score so highly on standard benchmarks that there is little room for researchers who develop better systems to demonstrate their improvements. The recent trend to abandon IID benchmarks in favor of adversarially-constructed, out-of-distribution test sets ensures that current models will perform poorly, but ultimately only obscures the abilities that we want our benchmarks to measure. In this position paper, we lay out four criteria that we argue NLU benchmarks should meet. We argue most current benchmarks fail at these criteria, and that adversarial data collection does not meaningfully address the causes of these failures. Instead, restoring a healthy evaluation ecosystem will require significant progress in the design of benchmark datasets, the reliability with which they are annotated, their size, and the ways they handle social bias.
Citation Graph
References [68]
Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018
39 papers in library cite
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019
17 papers in library cite
P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016
37 papers in library cite
A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018
26 papers in library cite
A. L. Maas, R. E. Daly, P. T. Pham, Dong Huang, Andrew Y. Ng, Christopher Potts - 2011
12 papers in library cite
Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015
25 papers in library cite
T. Kwiatkowski, J. Palomaki, O. Rhinehart, Michael Collins, A. P. Parikh, C. Alberti, D. Epstein, Illia Polosukhin, M. Kelcey, Jacob Devlin, K. Lee, K. N. Toutanova, Llion Jones, M. W. Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov - 2019
9 papers in library cite
P. Rajpurkar, R. Jia, Percy Liang - 2018
14 papers in library cite
Luis Von Ahn, Laura Dabbish - 2004
5 papers in library cite
A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019
15 papers in library cite
R. Jia, Percy Liang - 2017
11 papers in library cite
Hector J. Levesque, E. Davis, Leora Morgenstern - 2011
13 papers in library cite
R. T. Mccoy, Ellie Pavlick, Tal Linzen - 2019
5 papers in library cite
Yejin Choi - 2018
5 papers in library cite
D. Paperno, German Kruszewski, A. Lazaridou, N. Q. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, Raquel Fernandez - 2016
12 papers in library cite
Kawin Ethayarajh, Dan Jurafsky - 2020
3 papers in library cite
Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019
17 papers in library cite
2016
2 papers in library cite
Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen - 2020
4 papers in library cite
K. Sakaguchi, R. L. Bras, C. Bhagavatula, Yejin Choi - 2019
4 papers in library cite
M. T. Ribeiro, Tianhao Wu, C. Guestrin, Shivalika Singh - 2020
2 papers in library cite
H. Gonen, Y. Goldberg - 2019
4 papers in library cite
A. Naik, A. Ravichander, N. M. Sadeh, C. P. Rose, Graham Neubig - 2018
4 papers in library cite
D. Card, P. Henderson, U. Khandelwal, R. Jia, K. Mahowald, Dan Jurafsky - 2020
2 papers in library cite
Nikita Nangia, Samuel R. Bowman - 2019
3 papers in library cite
R. Rudinger, J. Naradowsky, B. Leonard, B. V. Durme - 2018
6 papers in library cite
R. Rudinger, C. May, B. V. Durme - 2017
3 papers in library cite
J. Dunietz, G. Burnham, A. Bharadwaj, O. Rambow, J. C. Carroll, D. Ferrucci - 2020
1 paper in library cites
Rowan Zellers, Ari Holtzman, E. Clark, Lianhui Qin, Ali Farhadi, Yejin Choi - 2020
2 papers in library cite
K. W. Church, J. Hestness - 2019
1 paper in library cites
S. Sugawara, P. Stenetorp, A. Aizawa - 2020
1 paper in library cites
S. L. Blodgett, S. Barocas, H. D. Iii, H. Wallach - 2020
7 papers in library cite
A. Poliak, A. Haldar, R. Rudinger, J. E. Hu, Ellie Pavlick, A. S. White, B. V. Durme - 2018
4 papers in library cite
E. M. Bender, B. Friedman - 2018
4 papers in library cite
Y. Nie, A. Williams, E. Dinan, Mohit Bansal, Jason Weston, Douwe Kiela - 2019
3 papers in library cite
E. M. Bender, A. Koller - 2020
3 papers in library cite
T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. D. Iii, K. Crawford - 2018
3 papers in library cite
Yonatan Bisk, Ari Holtzman, J. Thomason, Jacob Andreas, Yoshua Bengio, J. Chai, Mirella Lapata, A. Lazaridou, J. May, A. Nisnevich - 2020
3 papers in library cite
R. Cooper, D. Crouch, J. Eijck, C. Fox, J. Genabith, J. Jaspars, H. Kamp, D. Milward, M. Pinkal, M. Poesio, S. Pulman, T. Briscoe, H. Maier, K. Konrad - 1996
3 papers in library cite
L. Huang, R. L. Bras, C. Bhagavatula, Yejin Choi - 2019
2 papers in library cite
K. Webster, M. Recasens, V. Axelrod, J. Baldridge - 2018
2 papers in library cite
D. Dua, A. Gottumukkala, A. Talmor, Shivalika Singh, Matt Gardner - 2019
2 papers in library cite
M. Tsuchiya - 2018
2 papers in library cite
T. Niven, H. Y. Kao - 2019
2 papers in library cite
A. Ettinger, S. Rao, H. D. Iii, E. Bender - 2017
2 papers in library cite
M. Poesio, J. Chamberlain, S. Paun, J. Yu, A. Uma, U. Kruschwitz - 2019
1 paper in library cites
R. L. Bras, Swabha Swayamdipta, C. Bhagavatula, Rowan Zellers, M. E. Peters, Ashish Sabharwal, Yejin Choi - 2020
1 paper in library cites
C. Vania, R. Chen, Samuel R. Bowman - 2020
1 paper in library cites
S. Sugawara, P. Stenetorp, K. Inui, A. Aizawa - 2020
1 paper in library cites
K. Fort, B. Guillaume, H. Chastant - 2014
1 paper in library cites
Matt Gardner, Y. Artzi, V. Basmov, Jonathan Berant, B. Bogin, S. Chen, P. Dasigi, D. Dua, Y. Elazar, A. Gottumukkala, N. Gupta, Hananneh Hajishirzi, G. Ilharco, Daniel Khashabi, K. Lin, Joseph Liu, N. F. Liu, P. Mulcaire, Q. Ning, Shivalika Singh, Noah A. Smith, S. Subramanian, R. Tsarfaty, E. Wallace, A. Zhang, B. Zhou - 2020
1 paper in library cites
Svetlana Kiritchenko, Saif M. Mohammad - 2018
1 paper in library cites
N. Tiku - 2020
1 paper in library cites
Ellie Pavlick, T. Kwiatkowski - 2019
1 paper in library cites
Y. Pruksachatkun, Jason Phang, Haozhe Liu, Phu Mon Htut, X. Zhang, R. Y. Pang, C. Vania, K. Kann, Samuel R. Bowman - 2020
1 paper in library cites
M. Florestall - 2008
1 paper in library cites
Timo Schick, Hinrich Schutze - 2020
1 paper in library cites
Y. Meng, Xiang Ren, Z. Sun, Xiang Lisa Li, A. Yuan, F. Wu, Jeffrey Li - 2019
1 paper in library cites
C. Welty, P. Paritosh, L. Aroyo - 2019
1 paper in library cites
Samuel R. Bowman, J. Palomaki, L. B. Soares, E. Pitler - 2020
1 paper in library cites
Han Hu, Kyle Richardson, L. Xu, Lei Li, S. Kubler, L. Moss - 2020
1 paper in library cites
K. Wiggers - 2020
1 paper in library cites
B. Morschheuser, J. Hamari - 2019
1 paper in library cites
B. Hutchinson, A. Smart, A. Hanna, E. Denton, C. Greer, O. Kjartansson, P. Barnes, M. Mitchell - 2021
1 paper in library cites
Tao Li, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Vivek Srikumar - 2020
1 paper in library cites
C. Si, Shijie Wang, M. Y. Kan, J. J. Jiang - 2019
1 paper in library cites
J. B. Graber, B. Borschinger - 2020
1 paper in library cites
Cited by
1
papers in your library
Cites
28
papers in your library
Read
on June 3, 2026
Your review
Tags
Paper Aliases
No aliases