2024

Deepseekmath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peng Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Yiwei Li, Yonghui Wu

citations

Cite Score

76

AI summary

This paper introduces DeepSeekMath 7B, a language model pre-trained on 120B math-related tokens, achieving 51.7% on MATH benchmark without external tools and introduces Group Relative Policy Optimization (GRPO) for enhanced mathematical reasoning.

Main Contributions

  • Introduces DeepSeekMath 7B, an open language model that achieves 51.7% on the MATH benchmark without external tools or voting, comparable to Gemini-Ultra and GPT-4.
  • Develops the DeepSeekMath Corpus, a high-quality pre-training dataset of 120B math tokens extracted from Common Crawl using a fastText-based classifier and human annotation.
  • Presents Group Relative Policy Optimization (GRPO), an efficient variant of PPO that foregoes the critic model, reducing training resources and enhancing mathematical reasoning.
  • Demonstrates that code training prior to math training improves models' ability to solve mathematical problems.
  • Provides a unified paradigm for understanding various RL methods (RFT, DPO, PPO, GRPO) and conducts extensive experiments on their essential elements.

Abstract

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Citation Graph

Loading graph...

References [59]

Sort:
Filter:

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - 2017

10 papers in library cite

I. Loshchilov, Frank Hutter - 2017

7 papers in library cite

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - 2022

11 papers in library cite

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021

9 papers in library cite

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn - 2023

3 papers in library cite

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt - 2021

6 papers in library cite

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman - 2021

7 papers in library cite

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt - 2021

8 papers in library cite

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le - 2021

4 papers in library cite

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe - 2023

4 papers in library cite

Mirac Suzgun, Nathan Scales, Nathanael Scharli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei - 2022

4 papers in library cite

Peng Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yiwei Li, Deli Chen, Yonghui Wu, Zhifang Sui - 2023

1 paper in library cites

Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022

10 papers in library cite

Openai - 2023

6 papers in library cite

R. Anil, S. Borgeaud, Yonghui Wu, J. Alayrac, J. Yu, Radu Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, D. Silver, Slav Petrov, M. J. Johnson, I. Antonoglou, J. Schrittwieser, A. Glaese, Jixuan Chen, E. Pitler, T. P. Lillicrap, A. Lazaridou, O. Firat, J. Molloy, M. Isard, P. R. Barham, T. Hennigan, B. Lee, F. Viola, M. Reynolds, Yiheng Xu, R. Doherty, E. Collins, C. M. Meyer, E. Rutherford, E. Moreira, K. Ayoub, M. Goel, G. Tucker, E. Piqueras, M. Krikun, I. Barr, N. Savinov, Ivo Danihelka, B. Roelofs, A. White, Anders Andreassen, T. V. Glehn, L. Yagati, M. Kazemi, L. Gonzalez, M. Khalman, J. Sygnowski - 2023

1 paper in library cites

W. Kwon, Zhiyuan Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, Joseph Gonzalez, Haowei Zhang, Ion Stoica - 2023

5 papers in library cite

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. D. 1. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier - 2023

2 papers in library cite

Armand Joulin, E. Grave, Piotr Bojanowski, M. Douze, Hervé Jégou, Tomas Mikolov - 2016

1 paper in library cites

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, C. Anil, I. Schlag, T. G. Solo - 2022

3 papers in library cite

Hugo Touvron, L. Martin, K. Stone, P. Albert, Amjad Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, Mark Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. Lachaux, T. Lavril, Jaehoon Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Zhicheng Yan, I. Zarov, Y. Z. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom - 2023

3 papers in library cite

S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Yue Cao, K. Narasimhan - 2023

2 papers in library cite

Deepseek Ai - 2024

1 paper in library cites

Daniel Guo, Qihao Zhu, Diyi Yang, Z. Xie, K. Dong, Wenxuan Zhang, Guanduo Chen, Xiao Bi, Yonghui Wu, Y. K. Li, F. Luo, Yunyang Xiong, W. Liang - 2024

1 paper in library cites

Weizhu Chen, X. Ma, Xinpeng Wang, W. W. Cohen - 2022

1 paper in library cites

John Schulman, P. Moritz, Sergey Levine, M. Jordan, P. Abbeel - 2015

5 papers in library cite

S. Polu, Ilya Sutskever - 2020

3 papers in library cite

Zhihao Yuan, H. Yuan, Chun-Liang Li, G. Dong, C. Tan, Chang Zhou - 2023

3 papers in library cite

Z. Gou, Zhihong Shao, Y. Gong, Y. Shen, Yining Yang, M. Huang, N. Duan, Weizhu Chen - 2023

3 papers in library cite

John Schulman - 2020

2 papers in library cite

Y. Leviathan, M. Kalman, Y. Matias - 2023

2 papers in library cite

F. Shi, Mirac Suzgun, M. Freitag, Xinpeng Wang, S. Srivats, S. Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei - 2023

2 papers in library cite

Z. Azerbayev, H. Schoelkopf, K. Paster, M. D. Santos, S. M. Mcaleer, A. Q. Jiang, J. Deng, Stella Biderman, S. Welleck - 2023

2 papers in library cite

Peng Wang, Lei Li, L. C. Chen, Francis Song, B. Lin, Yue Cao, T. Liu, Zhifang Sui - 2023

2 papers in library cite

Xiang Yue, X. Qu, G. Zhang, Y. Fu, Weixiao Huang, Huan Sun, Yu Su, Weizhu Chen - 2023

2 papers in library cite

Longhui Yu, W. Jiang, H. Shi, J. Yu, Ze Liu, Y. Z. Zhang, J. T. Kwok, Zhiyuan Li, A. Weller, Weizhou Liu - 2023

2 papers in library cite

T. H. Trinh, Yonghui Wu, Quoc V. Le, He He, T. Luong - 2024

2 papers in library cite

H. Luo, Q. Sun, Chenfeng Xu, P. Zhao, J. Lou, C. Tao, X. Geng, Q. Lin, S. Chen, Danyang Zhang - 2023

2 papers in library cite

W. Zhong, R. Cui, Y. Guo, Yiqing Liang, S. Lu, Yuzhi Wang, A. Saied, Weizhu Chen, N. Duan - 2023

1 paper in library cites

C. Team - 2023

1 paper in library cites

T. Wei, J. Luan, Weizhou Liu, S. Dong, B. Wang - 2023

1 paper in library cites

A. Q. Jiang, S. Welleck, J. P. Zhou, Wentao Li, Joseph Liu, M. Jamnik, T. Lacroix, Yonghui Wu, G. Lample - 2022

1 paper in library cites

T. Tao - 2023

1 paper in library cites

Zhengtao Wang, R. Xia, P. Liu - 2023

1 paper in library cites

Z. Du, Y. Qian, Xiaodong Liu, M. Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang - 2022

1 paper in library cites

H. Flyer - 2023

1 paper in library cites

I. Ai - 2023

1 paper in library cites

Swaroop Mishra, M. Finlayson, P. Lu, L. Tang, S. Welleck, Chitta Baral, T. Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, A. Kalyan - 2022

1 paper in library cites

K. Zheng, J. M. Han, S. Polu - 2021

1 paper in library cites

K. Paster, M. D. Santos, Z. Azerbayev, Jimmy Lei Ba - 2023

1 paper in library cites

Leo Gao, Aman Madaan, Shuyan Zhou, U. Alon, P. Liu, Yining Yang, J. Callan, Graham Neubig - 2023

1 paper in library cites

Francis Song, B. Yu, M. Li, H. Yu, F. Huang, Yiwei Li, Haiming Wang - 2023

1 paper in library cites

T. Computer - 2023

1 paper in library cites

Zhihao Yuan, H. Yuan, C. Tan, Wenyi Wang, S. Huang, F. Huang - 2023

1 paper in library cites

X. Nguyen, Wenxuan Zhang, Xiang Lisa Li, M. M. Aljunied, Q. Tan, L. C. Cheng, Guanduo Chen, Y. Deng, Shusheng Yang, C. L. Liu, Haowei Zhang, L. Bing - 2023

1 paper in library cites

H. Xia, Tiezheng Ge, Peng Wang, S. Q. Chen, F. Wei, Zhifang Sui - 2023

1 paper in library cites

M. Wenzel, L. C. Paulson, T. Nipkow - 2008

1 paper in library cites

H. Xia, Zhilin Yang, Q. Dong, Peng Wang, Yiwei Li, Tiezheng Ge, T. Liu, Wentao Li, Zhifang Sui - 2024

1 paper in library cites

Collin Burns, P. Izmailov, J. H. Kirchner, Bowen Baker, Leo Gao, L. Aschenbrenner, Yanru Chen, A. Ecoffet, M. Joglekar, Jan Leike - 2023

1 paper in library cites

Cited by

3

papers in your library

Cites

23

papers in your library

Read

on June 1, 2026

Your review

Tags

RLHFVetto Study

Paper Aliases

No aliases