2023
Cite Score
34
AI summary
This paper introduces MATH-SHEPHERD, a process-oriented math reward model that assigns scores to each step of math problem solutions using automatically constructed supervision data, achieving significant accuracy improvements on GSM8K and MATH datasets with various LLMs.
Main Contributions
Abstract
In this paper, we present an innovative process-oriented math process reward model called MATH-SHEPHERD, which assigns a reward score to each step of math problem solutions. The training of MATH-SHEPHERD is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of MATH-SHEPHERD in two scenarios: 1) Verification: MATH-SHEPHERD is utilized for reranking multiple outputs generated by Large Language Models (LLMs); 2) Reinforcement Learning: MATH-SHEPHERD is employed to reinforce LLMs with step-by-step Proximal Policy Optimization (PPO). With MATH-SHEPHERD, a series of open-source LLMs demonstrates exceptional performance. For instance, the step-by-step PPO with MATH-SHEPHERD significantly improves the accuracy of Mistral-7B (77.9%→84.1% on GSM8K and 28.6%→33.0% on MATH). The accuracy can be further enhanced to 89.1% and 43.5% on GSM8K and MATH with the verification of MATH-SHEPHERD, respectively. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.
Citation Graph
References [46]
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, Ilya Sutskever, T. Lillicrap, M. Leach, Koray Kavukcuoglu, T. Graepel, Demis Hassabis - 2016
5 papers in library cite
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021
7 papers in library cite
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021
8 papers in library cite
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe - 2023
4 papers in library cite
Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins - 2022
4 papers in library cite
Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022
10 papers in library cite
Openai - 2023
6 papers in library cite
L. Zheng, Wei-Lin Chiang, Y. Sheng, S. Zhuang, Ziyi Wu, Y. Zhuang, Zongyu Lin, Zhiyuan Li, Dustin Li, E. Xing - 2023
1 paper in library cites
W. Kwon, Zhiyuan Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, Joseph Gonzalez, Haowei Zhang, Ion Stoica - 2023
5 papers in library cite
S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Yiwei Li, S. Lundberg - 2023
3 papers in library cite
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. D. 1. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier - 2023
2 papers in library cite
Xinpeng Wang, Jason Wei, Dale Schuurmans, Quoc Le, E. Chi, Denny Zhou - 2022
5 papers in library cite
Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen - 2020
4 papers in library cite
Hugo Touvron, L. Martin, K. Stone, P. Albert, Amjad Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, Mark Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. Lachaux, T. Lavril, Jaehoon Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Zhicheng Yan, I. Zarov, Y. Z. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom - 2023
3 papers in library cite
L. Kocsis, C. Szepesvari - 2006
3 papers in library cite
Zhihao Yuan, H. Yuan, Chun-Liang Li, G. Dong, C. Tan, Chang Zhou - 2023
3 papers in library cite
Z. Gou, Zhihong Shao, Y. Gong, Y. Shen, Yining Yang, M. Huang, N. Duan, Weizhu Chen - 2023
3 papers in library cite
R. Coulom - 2006
2 papers in library cite
Y. Leviathan, M. Kalman, Y. Matias - 2023
2 papers in library cite
Z. Azerbayev, H. Schoelkopf, K. Paster, M. D. Santos, S. M. Mcaleer, A. Q. Jiang, J. Deng, Stella Biderman, S. Welleck - 2023
2 papers in library cite
Peng Wang, Lei Li, L. C. Chen, Francis Song, B. Lin, Yue Cao, T. Liu, Zhifang Sui - 2023
2 papers in library cite
Xiang Yue, X. Qu, G. Zhang, Y. Fu, Weixiao Huang, Huan Sun, Yu Su, Weizhu Chen - 2023
2 papers in library cite
Longhui Yu, W. Jiang, H. Shi, J. Yu, Ze Liu, Y. Z. Zhang, J. T. Kwok, Zhiyuan Li, A. Weller, Weizhou Liu - 2023
2 papers in library cite
Gloria Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Yuxuan Zhu, L. Fan, A. Anandkumar - 2024
2 papers in library cite
H. Luo, Q. Sun, Chenfeng Xu, P. Zhao, J. Lou, C. Tao, X. Geng, Q. Lin, S. Chen, Danyang Zhang - 2023
2 papers in library cite
J. Kaddour, J. Harris, M. Mozes, H. Bradley, Roberta Raileanu, R. Mchardy - 2023
1 paper in library cites
Y. Fu, H. Peng, Ashish Sabharwal, Peter Clark, Tushar Khot - 2022
1 paper in library cites
Y. Z. Zhang, Jihan Yang, Y. Yuan, A. C. C. Yao - 2023
1 paper in library cites
Deepseek - 2023
1 paper in library cites
Ziyi Wu, Y. Hu, Weijia Shi, N. Dziri, A. Suhr, P. Ammanabrolu, Noah A. Smith, M. Ostendorf, Hananneh Hajishirzi - 2023
1 paper in library cites
J. S. Park, J. O'brien, C. J. Cai, M. R. Morris, Percy Liang, M. S. Bernstein - 2023
1 paper in library cites
Peng Wang, Lei Li, L. C. Chen, D. Zhu, B. Lin, Yue Cao, Qian Liu, T. Liu, Zhifang Sui - 2023
1 paper in library cites
J. Huang, X. Chen, Swaroop Mishra, H. S. Zheng, A. W. Yu, X. Song, Denny Zhou - 2023
1 paper in library cites
Siyuan Pan, V. Lialin, S. Muckatira, A. Rumshisky - 2023
1 paper in library cites
Q. Ma, H. Zhou, T. Liu, J. Yuan, P. Liu, Y. You, H. Yang - 2023
1 paper in library cites
H. Xia, Tiezheng Ge, F. Wei, Zhifang Sui - 2022
1 paper in library cites
Lei Li, Y. Yin, Shanda Li, L. C. Chen, Peng Wang, S. Ren, M. Li, Yining Yang, Jiacheng Xu, X. Sun - 2023
1 paper in library cites
Yiwei Li, Zongyu Lin, S. Zhang, Q. Fu, Berlin Chen, J. G. Lou, Weizhu Chen - 2023
1 paper in library cites
M. Swiechowski, K. Godlewski, B. Sawicki, J. Mandziuk - 2023
1 paper in library cites
F. Yu, A. Gao, B. Wang - 2023
1 paper in library cites
R. Anil, A. M. Dai, O. Firat, M. J. Johnson, D. Lepikhin, A. Passos, Siamak Shakeri, E. Taropa, P. Bailey, Ziru Chen - 2023
1 paper in library cites
Yueqi Song, W. Xiong, D. Zhu, Chun-Liang Li, K. Wang, Yuandong Tian, Shanda Li - 2023
1 paper in library cites
X. Zhu, J. Wang, Li Zhang, Y. Z. Zhang, Y. Huang, R. Gan, J. Zhang, Yining Yang - 2023
1 paper in library cites
L. C. Chen, Y. Z. Zhang, S. Ren, H. Zhao, Zhipeng Cai, Yuzhi Wang, Peng Wang, T. Liu, B. Chang - 2023
1 paper in library cites
J. Huang, K. C. C. Chang - 2023
1 paper in library cites
Z. Bi, N. Zhang, Y. Jiang, S. Deng, G. Zheng, H. Chen - 2023
1 paper in library cites
Cited by
1
papers in your library
Cites
14
papers in your library
Read
on May 31, 2026
Your review
Tags
Paper Aliases
No aliases