2023
Cite Score
67
AI summary
This paper compares process supervision with outcome supervision for training reliable reward models to solve complex multi-step reasoning problems from the MATH dataset, finding that process supervision significantly outperforms outcome supervision, achieves 78% accuracy, and that active learning improves its data efficiency 2.6x.
Main Contributions
Abstract
In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.
Citation Graph
References [26]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - 2022
11 papers in library cite
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021
7 papers in library cite
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei - 2017
11 papers in library cite
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021
8 papers in library cite
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano - 2020
10 papers in library cite
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeffrey Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman - 2021
7 papers in library cite
Geoffrey Irving - 2020
7 papers in library cite
Leo Gao, John Schulman, Jacob Hilton - 2022
3 papers in library cite
Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins - 2022
4 papers in library cite
Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022
10 papers in library cite
Openai - 2023
6 papers in library cite
S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Yiwei Li, S. Lundberg - 2023
3 papers in library cite
T. Kojima, Shixiang Shane Gu, M. Reid, Y. Matsuo, Y. Iwasawa - 2022
6 papers in library cite
Xinpeng Wang, Jason Wei, Dale Schuurmans, Quoc Le, E. Chi, Denny Zhou - 2022
5 papers in library cite
J. Maynez, Shashi Narayan, B. Bohnet, R. Mcdonald - 2020
6 papers in library cite
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, C. Anil, I. Schlag, T. G. Solo - 2022
3 papers in library cite
Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Benjamin Mann, Nova Dassarma, Nelson Elhage, Zac Hatfield Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Christopher Olah, Jared Kaplan - 2021
5 papers in library cite
Maxwell Nye, A. J. Andreassen, Guy Gur Ari, Henryk Michalewski, Jacob Austin, D. Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, D. Luan, Charles Sutton, Augustus Odena - 2021
5 papers in library cite
Tom Everitt, V. Krakovna, L. Orseau, Shane Legg - 2017
4 papers in library cite
Yiwei Li, Zongyu Lin, S. Zhang, Q. Fu, Berlin Chen, J. G. Lou, Weizhu Chen - 2022
3 papers in library cite
Antonia Creswell, M. Shanahan, Irina Higgins - 2022
3 papers in library cite
E. Zelikman, Yonghui Wu, J. Mu, N. Goodman - 2022
3 papers in library cite
A. Cotra - 2022
3 papers in library cite
E. Nichols, Leo Gao, R. Gomez - 2020
2 papers in library cite
Junhong Shen, Y. Yin, Lei Li, L. Shang, Xu Jiang, Mingchuan Zhang, Qian Liu - 2021
2 papers in library cite
Andreas Stuhlmuller, J. Byun - 2022
2 papers in library cite
Cited by
4
papers in your library
Cites
18
papers in your library
Read
on May 31, 2026
Your review
Tags
Paper Aliases
No aliases