Multi-Module Grpo: Composing Policy Gradients and Prompt Optimization for Language Model Programs
Noah Ziems, Dilara Soylu, Lakshya A. Agrawal, I. Miller, L. Lai, C. Qian, K. Song, Meng Jiang, Dan Klein, Matei Zaharia, K. D'oosterlinck, Christopher Potts, Omar Khattab