Deepseek-prover-v1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
H. Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, W. Zhao, Haiming Wang, Bing Liu, Li Zhang, X. Lu, Q. Du, W. Gao, Qihao Zhu, Diyi Yang, Z. Gou, Z. F. Wu, F. Luo, C. Ruan