2021
Cite Score
82
AI summary
This paper introduces GSM8K, a dataset of 8.5K math word problems, and proposes training verifiers to judge the correctness of model completions, demonstrating significant performance improvements on GSM8K and better scaling with data compared to finetuning baselines.
Main Contributions
Abstract
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution. To increase performance, we propose training verifiers to judge the correctness of model completions. At test time, we generate many candidate solutions and select the one ranked highest by the verifier. We demonstrate that verification significantly improves performance on GSM8K, and we provide strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.
Citation Graph
References [29]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei - 2020
12 papers in library cite
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021
8 papers in library cite
A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019
15 papers in library cite
A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019
6 papers in library cite
S. Y. Miao, C. C. Liang, K. Y. Su - 2020
3 papers in library cite
A. Talmor, J. Herzig, N. Lourie, Jonathan Berant - 2019
3 papers in library cite
Nate Kushman, Y. Artzi, Luke Zettlemoyer, R. Barzilay - 2014
3 papers in library cite
W. Ling, D. Yogatama, C. Dyer, Phil Blunsom - 2017
3 papers in library cite
E. Nichols, Leo Gao, R. Gomez - 2020
2 papers in library cite
G. Lample, F. Charton - 2020
2 papers in library cite
Junhong Shen, Y. Yin, Lei Li, L. Shang, Xu Jiang, Mingchuan Zhang, Qian Liu - 2021
2 papers in library cite
Dong Huang, Sherry Shi, Chin Yew Lin, J. Yin, W. Ma - 2016
2 papers in library cite
Joseph Liu, L. Cui, Haozhe Liu, Dong Huang, Yuzhi Wang, Y. Z. Zhang - 2020
2 papers in library cite
Z. Xie, S. Sun - 2019
1 paper in library cites
W. Zhao, M. Shang, Yibo Liu, Lisa Wang, Joseph Liu - 2020
1 paper in library cites
Yuzhi Wang, Xiaodong Liu, Sherry Shi - 2017
1 paper in library cites
Shanda Li, L. Wu, S. Feng, Frank Xu, Frank Xu, S. Zhong - 2020
1 paper in library cites
K. Chen, Q. Huang, H. Palangi, P. Smolensky, K. D. Forbus, Jianfeng Gao - 2020
1 paper in library cites
J. T. Shen, M. Yamashita, E. Prihar, N. Heffernan, Xiaobao Wu, B. Graff, D. L. Lee - 2021
1 paper in library cites
S. Peng, K. Yuan, Leo Gao, Z. Tang - 2021
1 paper in library cites
Z. Liang, J. Zhang, J. Shao, X. Zhang - 2021
1 paper in library cites
Dong Huang, Joseph Liu, Chin Yew Lin, J. Yin - 2018
1 paper in library cites
X. Chen, C. Liang, A. W. Yu, Denny Zhou, Dawn Song, Quoc V. Le - 2019
1 paper in library cites
B. Kim, K. S. Ki, D. L. Lee, G. Gweon - 2020
1 paper in library cites
T. R. Chiang, Y. N. Chen - 2018
1 paper in library cites
S. Roy, Dan Roth - 2015
1 paper in library cites
Cited by
7
papers in your library
Cites
7
papers in your library
Read
on May 30, 2026
Your review
Tags
Paper Aliases
No aliases