2022

Solving Math Word Problems With Process- And Outcome-Based Feedback

Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins

citations

Cite Score

29

AI summary

This paper compares process-based and outcome-based supervision for language models on the GSM8K math word problem dataset, finding that while both achieve similar final-answer error rates, process-based feedback or emulated process-based feedback via reward models significantly reduces trace errors, achieving new state-of-the-art results.

Main Contributions

  • First comprehensive comparison between process- and outcome-based approaches for language models on a natural language task (GSM8K).
  • Pure outcome-based supervision yields similar final-answer error rates with less label supervision.
  • Achieved new state-of-the-art final-answer error rate of 12.7% and reasoning error rate of 3.4%.
  • Demonstrated that process-based supervision or a reward model emulating it is necessary for low trace error.
  • Found that outcome-supervised reward models approximate process-based labels, helping to explain their effectiveness in reducing trace error.

Abstract

Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% rightarrow 12.7% final-answer error and 14.0% rightarrow 3.4% reasoning error among final-answer-correct solutions.

Citation Graph

Loading graph...

References [64]

Sort:
Filter:

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020

21 papers in library cite

I. Loshchilov, Frank Hutter - 2017

7 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - 2022

11 papers in library cite

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021

9 papers in library cite

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021

7 papers in library cite

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei - 2017

11 papers in library cite

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021

8 papers in library cite

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano - 2020

10 papers in library cite

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeffrey Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman - 2021

7 papers in library cite

Geoffrey Irving - 2020

7 papers in library cite

Paul Christiano, Buck Shlegeris, Dario Amodei - 2018

7 papers in library cite

Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022

10 papers in library cite

Aakanksha Chowdhery, S. Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, A. Roberts, P. Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann - 2023

6 papers in library cite

T. Kojima, Shixiang Shane Gu, M. Reid, Y. Matsuo, Y. Iwasawa - 2022

6 papers in library cite

Dario Amodei, Christopher Olah, Jacob Steinhardt, Paul Christiano, John Schulman, D. Mane - 2016

6 papers in library cite

Zhilin Yang, P. Qi, S. Zhang, Yoshua Bengio, W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning - 2018

4 papers in library cite

Xinpeng Wang, Jason Wei, Dale Schuurmans, Quoc Le, E. Chi, Denny Zhou - 2022

5 papers in library cite

Missing author list

2022

4 papers in library cite

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, C. Anil, I. Schlag, T. G. Solo - 2022

3 papers in library cite

A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019

6 papers in library cite

Missing author list

2021

2 papers in library cite

Geoffrey Irving, Paul Christiano, Dario Amodei - 2018

8 papers in library cite

Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018

6 papers in library cite

Maxwell Nye, A. J. Andreassen, Guy Gur Ari, Henryk Michalewski, Jacob Austin, D. Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, D. Luan, Charles Sutton, Augustus Odena - 2021

5 papers in library cite

S. Y. Miao, C. C. Liang, K. Y. Su - 2020

3 papers in library cite

J. Menick, M. Trebacz, V. Mikulik, J. Aslanides, Francis Song, M. Chadwick, M. Glaese, S. Young, L. C. Gillingham, Geoffrey Irving, N. Mcaleese - 2022

3 papers in library cite

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Yanru Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Driessche, T. Graepel, Demis Hassabis - 2017

7 papers in library cite

Z. Kenton, Tom Everitt, L. Weidinger, I. Gabriel, V. Mikulik, Geoffrey Irving - 2021

4 papers in library cite

Tom Everitt, V. Krakovna, L. Orseau, Shane Legg - 2017

4 papers in library cite

T. Anthony, Z. Tian, D. Barber - 2017

4 papers in library cite

S. Polu, Ilya Sutskever - 2020

3 papers in library cite

Nate Kushman, Y. Artzi, Luke Zettlemoyer, R. Barzilay - 2014

3 papers in library cite

J. Cai, R. Shin, Dawn Song - 2017

3 papers in library cite

Yiwei Li, Zongyu Lin, S. Zhang, Q. Fu, Berlin Chen, J. G. Lou, Weizhu Chen - 2022

3 papers in library cite

W. Ling, D. Yogatama, C. Dyer, Phil Blunsom - 2017

3 papers in library cite

Antonia Creswell, M. Shanahan, Irina Higgins - 2022

3 papers in library cite

E. Zelikman, Yonghui Wu, J. Mu, N. Goodman - 2022

3 papers in library cite

Ethan Perez, Douwe Kiela, Kyunghyun Cho - 2021

3 papers in library cite

Alex Graves - 2016

2 papers in library cite

S. Reed, N. D. Freitas - 2015

2 papers in library cite

V. Krakovna, Jonathan Uesato, V. Mikulik, M. Rahtz, Tom Everitt, Ramana Kumar, Z. Kenton, Jan Leike, Shane Legg - 2020

2 papers in library cite

Andreas Stuhlmuller, J. Byun - 2022

2 papers in library cite

Azalia Mirhoseini, Anna Goldie, M. Yazgan, J. W. Jiang, E. Songhori, Shijie Wang, Y. J. Lee, E. Johnson, O. Pathak, A. Nazi - 2021

1 paper in library cites

A. Guez, M. Mirza, K. Gregor, R. Kabra, S. Racaniere, T. Weber, D. Raposo, Adam Santoro, L. Orseau, T. Eccles - 2019

1 paper in library cites

Jonathan Uesato, Ramana Kumar, V. Krakovna, Tom Everitt, R. Ngo, Shane Legg - 2020

1 paper in library cites

M. Rauh, J. Mellor, Jonathan Uesato, P. S. Huang, J. Welbl, L. Weidinger, S. Dathathri, A. Glaese, Geoffrey Irving, I. Gabriel - 2022

1 paper in library cites

Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant - 2021

1 paper in library cites

B. Dalvi, P. A. Jansen, Oyvind Tafjord, Z. Xie, H. Smith, L. Pipatanangkura, Peter Clark - 2021

1 paper in library cites

David Dohan, Weixin Xu, Aitor Lewkowycz, Jacob Austin, D. Bieber, R. G. Lopes, Yonghui Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl Dickstein - 2022

1 paper in library cites

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, Demis Hassabis, T. Graepel - 2020

1 paper in library cites

A. Abdolmaleki, J. T. Springenberg, Y. Tassa, Rémi Munos, N. Heess, M. Riedmiller - 2018

1 paper in library cites

N. Gontier, K. Sinha, Siva Reddy, C. Pal - 2020

1 paper in library cites

S. Kumar, W. Byrne - 2004

1 paper in library cites

Chun-Liang Li, D. Tarlow, A. L. Gaunt, M. Brockschmidt, Nate Kushman - 2016

1 paper in library cites

R. E. Yaniv - 2010

1 paper in library cites

Oyvind Tafjord, B. D. Mishra, Peter Clark - 2020

1 paper in library cites

Ramana Kumar, Jonathan Uesato, R. Ngo, Tom Everitt, V. Krakovna, Shane Legg - 2020

1 paper in library cites

Y. Geifman, R. E. Yaniv - 2017

1 paper in library cites

V. Shwartz, P. West, R. L. Bras, C. Bhagavatula, Yejin Choi - 2020

1 paper in library cites

Ethan Perez, P. Lewis, W. T. Yih, Kyunghyun Cho, Douwe Kiela - 2020

1 paper in library cites

O. Goldman, V. Latcinnik, U. Naveh, A. Globerson, Jonathan Berant - 2017

1 paper in library cites

Cited by

4

papers in your library

Cites

26

papers in your library

Read

on May 31, 2026

Your review

Tags

Paper Aliases

No aliases