Skyrl-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning
S. Cao, S. Hegde, Dustin Li, T. Griggs, Shuming Liu, Eric Tang, J. Pan, Xinpeng Wang, A. Malik, Graham Neubig, K. Hakhamaneshi, R. Liaw, P. Moritz, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica