2022
Cite Score
29
AI summary
This paper compares process-based and outcome-based supervision for language models on the GSM8K math word problem dataset, finding that while both achieve similar final-answer error rates, process-based feedback or emulated process-based feedback via reward models significantly reduces trace errors, achieving new state-of-the-art results.
Main Contributions
Abstract
Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% rightarrow 12.7% final-answer error and 14.0% rightarrow 3.4% reasoning error among final-answer-correct solutions.
Citation Graph
References [64]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
I. Loshchilov, Frank Hutter - 2017
7 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - 2022
11 papers in library cite
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021
9 papers in library cite
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021
7 papers in library cite
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei - 2017
11 papers in library cite
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021
8 papers in library cite
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano - 2020
10 papers in library cite
Alex Graves, G. Wayne, Ivo Danihelka - 2014
18 papers in library cite
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeffrey Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman - 2021
7 papers in library cite
Geoffrey Irving - 2020
7 papers in library cite
Paul Christiano, Buck Shlegeris, Dario Amodei - 2018
7 papers in library cite
Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022
10 papers in library cite
Aakanksha Chowdhery, S. Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, A. Roberts, P. Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann - 2023
6 papers in library cite
T. Kojima, Shixiang Shane Gu, M. Reid, Y. Matsuo, Y. Iwasawa - 2022
6 papers in library cite
Dario Amodei, Christopher Olah, Jacob Steinhardt, Paul Christiano, John Schulman, D. Mane - 2016
6 papers in library cite
Zhilin Yang, P. Qi, S. Zhang, Yoshua Bengio, W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning - 2018
4 papers in library cite
Xinpeng Wang, Jason Wei, Dale Schuurmans, Quoc Le, E. Chi, Denny Zhou - 2022
5 papers in library cite
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, C. Anil, I. Schlag, T. G. Solo - 2022
3 papers in library cite
A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019
6 papers in library cite
2021
2 papers in library cite
Geoffrey Irving, Paul Christiano, Dario Amodei - 2018
8 papers in library cite
Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018
6 papers in library cite
Maxwell Nye, A. J. Andreassen, Guy Gur Ari, Henryk Michalewski, Jacob Austin, D. Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, D. Luan, Charles Sutton, Augustus Odena - 2021
5 papers in library cite
S. Y. Miao, C. C. Liang, K. Y. Su - 2020
3 papers in library cite
J. Menick, M. Trebacz, V. Mikulik, J. Aslanides, Francis Song, M. Chadwick, M. Glaese, S. Young, L. C. Gillingham, Geoffrey Irving, N. Mcaleese - 2022
3 papers in library cite
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Yanru Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Driessche, T. Graepel, Demis Hassabis - 2017
7 papers in library cite
Z. Kenton, Tom Everitt, L. Weidinger, I. Gabriel, V. Mikulik, Geoffrey Irving - 2021
4 papers in library cite
Tom Everitt, V. Krakovna, L. Orseau, Shane Legg - 2017
4 papers in library cite
T. Anthony, Z. Tian, D. Barber - 2017
4 papers in library cite
S. Polu, Ilya Sutskever - 2020
3 papers in library cite
Nate Kushman, Y. Artzi, Luke Zettlemoyer, R. Barzilay - 2014
3 papers in library cite
J. Cai, R. Shin, Dawn Song - 2017
3 papers in library cite
Yiwei Li, Zongyu Lin, S. Zhang, Q. Fu, Berlin Chen, J. G. Lou, Weizhu Chen - 2022
3 papers in library cite
W. Ling, D. Yogatama, C. Dyer, Phil Blunsom - 2017
3 papers in library cite
Antonia Creswell, M. Shanahan, Irina Higgins - 2022
3 papers in library cite
E. Zelikman, Yonghui Wu, J. Mu, N. Goodman - 2022
3 papers in library cite
Ethan Perez, Douwe Kiela, Kyunghyun Cho - 2021
3 papers in library cite
A. Cotra - 2022
3 papers in library cite
Alex Graves - 2016
2 papers in library cite
S. Reed, N. D. Freitas - 2015
2 papers in library cite
V. Krakovna, Jonathan Uesato, V. Mikulik, M. Rahtz, Tom Everitt, Ramana Kumar, Z. Kenton, Jan Leike, Shane Legg - 2020
2 papers in library cite
Andreas Stuhlmuller, J. Byun - 2022
2 papers in library cite
Azalia Mirhoseini, Anna Goldie, M. Yazgan, J. W. Jiang, E. Songhori, Shijie Wang, Y. J. Lee, E. Johnson, O. Pathak, A. Nazi - 2021
1 paper in library cites
A. Guez, M. Mirza, K. Gregor, R. Kabra, S. Racaniere, T. Weber, D. Raposo, Adam Santoro, L. Orseau, T. Eccles - 2019
1 paper in library cites
Jonathan Uesato, Ramana Kumar, V. Krakovna, Tom Everitt, R. Ngo, Shane Legg - 2020
1 paper in library cites
M. Rauh, J. Mellor, Jonathan Uesato, P. S. Huang, J. Welbl, L. Weidinger, S. Dathathri, A. Glaese, Geoffrey Irving, I. Gabriel - 2022
1 paper in library cites
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant - 2021
1 paper in library cites
B. Dalvi, P. A. Jansen, Oyvind Tafjord, Z. Xie, H. Smith, L. Pipatanangkura, Peter Clark - 2021
1 paper in library cites
David Dohan, Weixin Xu, Aitor Lewkowycz, Jacob Austin, D. Bieber, R. G. Lopes, Yonghui Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl Dickstein - 2022
1 paper in library cites
J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, Demis Hassabis, T. Graepel - 2020
1 paper in library cites
A. Abdolmaleki, J. T. Springenberg, Y. Tassa, Rémi Munos, N. Heess, M. Riedmiller - 2018
1 paper in library cites
N. Gontier, K. Sinha, Siva Reddy, C. Pal - 2020
1 paper in library cites
S. Kumar, W. Byrne - 2004
1 paper in library cites
Chun-Liang Li, D. Tarlow, A. L. Gaunt, M. Brockschmidt, Nate Kushman - 2016
1 paper in library cites
R. E. Yaniv - 2010
1 paper in library cites
Oyvind Tafjord, B. D. Mishra, Peter Clark - 2020
1 paper in library cites
Ramana Kumar, Jonathan Uesato, R. Ngo, Tom Everitt, V. Krakovna, Shane Legg - 2020
1 paper in library cites
Y. Geifman, R. E. Yaniv - 2017
1 paper in library cites
V. Shwartz, P. West, R. L. Bras, C. Bhagavatula, Yejin Choi - 2020
1 paper in library cites
Ethan Perez, P. Lewis, W. T. Yih, Kyunghyun Cho, Douwe Kiela - 2020
1 paper in library cites
O. Goldman, V. Latcinnik, U. Naveh, A. Globerson, Jonathan Berant - 2017
1 paper in library cites
Cited by
4
papers in your library
Cites
26
papers in your library
Read
on May 31, 2026
Your review
Tags
Paper Aliases
No aliases