2023

Math-Shepherd: Verify and Reinforce LLMS Step-by-Step Without Human Annotations

Peng Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yiwei Li, Deli Chen, Yonghui Wu, Zhifang Sui

citations

Cite Score

34

AI summary

This paper introduces MATH-SHEPHERD, a process-oriented math reward model that assigns scores to each step of math problem solutions using automatically constructed supervision data, achieving significant accuracy improvements on GSM8K and MATH datasets with various LLMs.

Main Contributions

  • Propose MATH-SHEPHERD, an automatic process annotation framework for math reasoning tasks, eliminating the need for human annotations.
  • Introduce a novel definition of an intermediate step's quality based on its potential to deduce the correct final answer.
  • Demonstrate MATH-SHEPHERD's effectiveness in two scenarios: reranking LLM outputs for verification and reinforcing LLMs with step-by-step Proximal Policy Optimization (PPO).
  • Achieve exceptional performance with open-source LLMs, for instance, improving Mistral-7B accuracy to 84.1% on GSM8K and 33.0% on MATH via step-by-step PPO.
  • Further enhance accuracy to 89.1% on GSM8K and 43.5% on MATH through MATH-SHEPHERD verification, setting new benchmarks for open-source models without additional tools.

Abstract

In this paper, we present an innovative process-oriented math process reward model called MATH-SHEPHERD, which assigns a reward score to each step of math problem solutions. The training of MATH-SHEPHERD is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of MATH-SHEPHERD in two scenarios: 1) Verification: MATH-SHEPHERD is utilized for reranking multiple outputs generated by Large Language Models (LLMs); 2) Reinforcement Learning: MATH-SHEPHERD is employed to reinforce LLMs with step-by-step Proximal Policy Optimization (PPO). With MATH-SHEPHERD, a series of open-source LLMs demonstrates exceptional performance. For instance, the step-by-step PPO with MATH-SHEPHERD significantly improves the accuracy of Mistral-7B (77.9%→84.1% on GSM8K and 28.6%→33.0% on MATH). The accuracy can be further enhanced to 89.1% and 43.5% on GSM8K and MATH with the verification of MATH-SHEPHERD, respectively. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.

Citation Graph

Loading graph...

References [46]

Sort:
Filter:

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, Ilya Sutskever, T. Lillicrap, M. Leach, Koray Kavukcuoglu, T. Graepel, Demis Hassabis - 2016

5 papers in library cite

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021

7 papers in library cite

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021

8 papers in library cite

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe - 2023

4 papers in library cite

Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins - 2022

4 papers in library cite

Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022

10 papers in library cite

Openai - 2023

6 papers in library cite

L. Zheng, Wei-Lin Chiang, Y. Sheng, S. Zhuang, Ziyi Wu, Y. Zhuang, Zongyu Lin, Zhiyuan Li, Dustin Li, E. Xing - 2023

1 paper in library cites

W. Kwon, Zhiyuan Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, Joseph Gonzalez, Haowei Zhang, Ion Stoica - 2023

5 papers in library cite

S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Yiwei Li, S. Lundberg - 2023

3 papers in library cite

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. D. 1. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier - 2023

2 papers in library cite

Xinpeng Wang, Jason Wei, Dale Schuurmans, Quoc Le, E. Chi, Denny Zhou - 2022

5 papers in library cite

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen - 2020

4 papers in library cite

Hugo Touvron, L. Martin, K. Stone, P. Albert, Amjad Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, Mark Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. Lachaux, T. Lavril, Jaehoon Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Zhicheng Yan, I. Zarov, Y. Z. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom - 2023

3 papers in library cite

L. Kocsis, C. Szepesvari - 2006

3 papers in library cite

Zhihao Yuan, H. Yuan, Chun-Liang Li, G. Dong, C. Tan, Chang Zhou - 2023

3 papers in library cite

Z. Gou, Zhihong Shao, Y. Gong, Y. Shen, Yining Yang, M. Huang, N. Duan, Weizhu Chen - 2023

3 papers in library cite

R. Coulom - 2006

2 papers in library cite

Y. Leviathan, M. Kalman, Y. Matias - 2023

2 papers in library cite

Z. Azerbayev, H. Schoelkopf, K. Paster, M. D. Santos, S. M. Mcaleer, A. Q. Jiang, J. Deng, Stella Biderman, S. Welleck - 2023

2 papers in library cite

Peng Wang, Lei Li, L. C. Chen, Francis Song, B. Lin, Yue Cao, T. Liu, Zhifang Sui - 2023

2 papers in library cite

Xiang Yue, X. Qu, G. Zhang, Y. Fu, Weixiao Huang, Huan Sun, Yu Su, Weizhu Chen - 2023

2 papers in library cite

Longhui Yu, W. Jiang, H. Shi, J. Yu, Ze Liu, Y. Z. Zhang, J. T. Kwok, Zhiyuan Li, A. Weller, Weizhou Liu - 2023

2 papers in library cite

Gloria Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Yuxuan Zhu, L. Fan, A. Anandkumar - 2024

2 papers in library cite

H. Luo, Q. Sun, Chenfeng Xu, P. Zhao, J. Lou, C. Tao, X. Geng, Q. Lin, S. Chen, Danyang Zhang - 2023

2 papers in library cite

J. Kaddour, J. Harris, M. Mozes, H. Bradley, Roberta Raileanu, R. Mchardy - 2023

1 paper in library cites

Y. Fu, H. Peng, Ashish Sabharwal, Peter Clark, Tushar Khot - 2022

1 paper in library cites

Y. Z. Zhang, Jihan Yang, Y. Yuan, A. C. C. Yao - 2023

1 paper in library cites

Deepseek - 2023

1 paper in library cites

Ziyi Wu, Y. Hu, Weijia Shi, N. Dziri, A. Suhr, P. Ammanabrolu, Noah A. Smith, M. Ostendorf, Hananneh Hajishirzi - 2023

1 paper in library cites

J. S. Park, J. O'brien, C. J. Cai, M. R. Morris, Percy Liang, M. S. Bernstein - 2023

1 paper in library cites

Peng Wang, Lei Li, L. C. Chen, D. Zhu, B. Lin, Yue Cao, Qian Liu, T. Liu, Zhifang Sui - 2023

1 paper in library cites

J. Huang, X. Chen, Swaroop Mishra, H. S. Zheng, A. W. Yu, X. Song, Denny Zhou - 2023

1 paper in library cites

Siyuan Pan, V. Lialin, S. Muckatira, A. Rumshisky - 2023

1 paper in library cites

Q. Ma, H. Zhou, T. Liu, J. Yuan, P. Liu, Y. You, H. Yang - 2023

1 paper in library cites

H. Xia, Tiezheng Ge, F. Wei, Zhifang Sui - 2022

1 paper in library cites

Lei Li, Y. Yin, Shanda Li, L. C. Chen, Peng Wang, S. Ren, M. Li, Yining Yang, Jiacheng Xu, X. Sun - 2023

1 paper in library cites

Yiwei Li, Zongyu Lin, S. Zhang, Q. Fu, Berlin Chen, J. G. Lou, Weizhu Chen - 2023

1 paper in library cites

M. Swiechowski, K. Godlewski, B. Sawicki, J. Mandziuk - 2023

1 paper in library cites

F. Yu, A. Gao, B. Wang - 2023

1 paper in library cites

R. Anil, A. M. Dai, O. Firat, M. J. Johnson, D. Lepikhin, A. Passos, Siamak Shakeri, E. Taropa, P. Bailey, Ziru Chen - 2023

1 paper in library cites

Yueqi Song, W. Xiong, D. Zhu, Chun-Liang Li, K. Wang, Yuandong Tian, Shanda Li - 2023

1 paper in library cites

X. Zhu, J. Wang, Li Zhang, Y. Z. Zhang, Y. Huang, R. Gan, J. Zhang, Yining Yang - 2023

1 paper in library cites

L. C. Chen, Y. Z. Zhang, S. Ren, H. Zhao, Zhipeng Cai, Yuzhi Wang, Peng Wang, T. Liu, B. Chang - 2023

1 paper in library cites

J. Huang, K. C. C. Chang - 2023

1 paper in library cites

Z. Bi, N. Zhang, Y. Jiang, S. Deng, G. Zheng, H. Chen - 2023

1 paper in library cites

Cited by

1

papers in your library

Cites

14

papers in your library

Read

on May 31, 2026

Your review

Tags

Paper Aliases

No aliases