2021

Measuring Mathematical Problem Solving With the MATH Dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

citations

Cite Score

74

AI summary

This paper introduces MATH, a new dataset of 12,500 challenging competition mathematics problems with step-by-step solutions, and AMPS, a large auxiliary pretraining dataset, finding that even enormous Transformer models struggle with mathematical problem-solving, indicating a need for new algorithmic advancements.

Main Contributions

  • Introduction of MATH, a new dataset of 12,500 challenging competition mathematics problems with full step-by-step solutions, for measuring mathematical problem-solving ability in machine learning models.
  • Contribution of AMPS, a large auxiliary pretraining dataset derived from Khan Academy and Mathematica, to help models learn the fundamentals of mathematics.
  • Demonstration that state-of-the-art Transformer models achieve relatively low accuracy on MATH, suggesting that simply scaling models will not be sufficient for strong mathematical reasoning.
  • Observation that having models generate step-by-step solutions during inference can degrade accuracy, but providing partial ground truth step-by-step solutions or training on solutions can improve performance.
  • Analysis showing that AMPS pretraining enables smaller models to perform comparably to significantly larger fine-tuned models, highlighting its value as a pretraining dataset.

Abstract

Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.

Citation Graph

Loading graph...

References [42]

Sort:
Filter:

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020

21 papers in library cite

I. Loshchilov, Frank Hutter - 2017

7 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Martha Lewis, Yibo Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer - 2019

6 papers in library cite

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021

9 papers in library cite

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt - 2021

6 papers in library cite

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei - 2020

12 papers in library cite

Thomas Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, Sam Shleifer, P. V. Platen, C. Ma, Yacine Jernite, J. Plu, Chenfeng Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, Alexander M. Rush - 2019

7 papers in library cite

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le - 2021

4 papers in library cite

A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019

15 papers in library cite

Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019

17 papers in library cite

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen - 2020

4 papers in library cite

Suchin Gururangan, A. Marasovic, Swabha Swayamdipta, K. Lo, I. Beltagy, D. Downey, Noah A. Smith - 2020

2 papers in library cite

A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019

6 papers in library cite

Tom Henighan, Jared Kaplan, M. Katz, Mark Chen, Christopher Hesse, J. Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, C. Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish - 2020

5 papers in library cite

Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hananneh Hajishirzi - 2020

5 papers in library cite

S. Y. Miao, C. C. Liang, K. Y. Su - 2020

3 papers in library cite

Danny Hernandez, Jared Kaplan, Tom Henighan, Sam McCandlish - 2021

5 papers in library cite

Dan Hendrycks, Kevin Gimpel - 2017

4 papers in library cite

Shane Legg, M. Hutter - 2007

4 papers in library cite

S. Polu, Ilya Sutskever - 2020

3 papers in library cite

W. Ling, D. Yogatama, C. Dyer, Phil Blunsom - 2017

3 papers in library cite

D. Saxton, Edward Grefenstette, F. Hill, P. Kohli - 2019

2 papers in library cite

G. Lample, F. Charton - 2020

2 papers in library cite

Dong Huang, Prafulla Dhariwal, Dawn Song, Ilya Sutskever - 2019

2 papers in library cite

Dong Huang, Sherry Shi, Chin Yew Lin, J. Yin, W. Ma - 2016

2 papers in library cite

Joseph Liu, L. Cui, Haozhe Liu, Dong Huang, Yuzhi Wang, Y. Z. Zhang - 2020

2 papers in library cite

Alec Radford, Luke Metz, S. Chintala - 2015

2 papers in library cite

Jose Hernandez Orallo - 1998

1 paper in library cites

Christian Szegedy - 2020

1 paper in library cites

Jose Hernandez Orallo - 2000

1 paper in library cites

S. Murty, P. W. Koh, Percy Liang - 2020

1 paper in library cites

K. Bansal, S. Loos, M. N. Rabe, Christian Szegedy, S. Wilcox - 2019

1 paper in library cites

G. Polya - 1945

1 paper in library cites

M. Crouse, I. Abdelaziz, C. Cornelio, V. Thost, L. Wu, K. D. Forbus, A. Fokoue - 2019

1 paper in library cites

Yonghui Wu, M. Rabe, Wentao Li, Jimmy Lei Ba, R. B. Grosse, Christian Szegedy - 2021

1 paper in library cites

M. N. Rabe, D. L. Lee, K. Bansal, Christian Szegedy - 2020

1 paper in library cites

D. A. Mcallester - 2020

1 paper in library cites

Wentao Li, Longhui Yu, Yonghui Wu, L. C. Paulson - 2020

1 paper in library cites

E. Wigner - 1960

1 paper in library cites

Cited by

8

papers in your library

Cites

17

papers in your library

Read

on May 30, 2026

Your review

Tags

Paper Aliases

No aliases