2017
Cite Score
96
AI summary
This paper introduces Proximal Policy Optimization (PPO), a new family of policy gradient methods that uses a novel objective function for multiple minibatch updates, empirically outperforming other online policy gradient methods on simulated robotic locomotion and Atari game playing benchmarks.
Main Contributions
Abstract
We propose a new family of policy gradient methods for reinforcement learning, which al-ternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gra-dient methods perform one gradient update per data sample, we propose a novel objectivefunction that enables multiple epochs of minibatch updates. The new methods, which we callproximal policy optimization (PPO), have some of the benefits of trust region policy optimiza-tion (TRPO), but they are much simpler to implement, more general, and have better samplecomplexity (empirically). Our experiments test PPO on a collection of benchmark tasks, includ-ing simulated robotic locomotion and Atari game playing, and we show that PPO outperformsother online policy gradient methods, and overall strikes a favorable balance between samplecomplexity, simplicity, and wall-time.
Citation Graph
References [14]
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
V. Mnih - 2015
9 papers in library cite
R. Williams - 1992
11 papers in library cite
John Schulman, Sergey Levine, P. Abbeel, Michael I. Jordan, P. Moritz - 2015
4 papers in library cite
John Schulman, P. Moritz, Sergey Levine, M. Jordan, P. Abbeel - 2015
5 papers in library cite
M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling - 2013
5 papers in library cite
V. Mnih, A. P. Badia, M. Mirza, Alex Graves, T. Lillicrap, T. Harley, D. Silver, Koray Kavukcuoglu - 2016
3 papers in library cite
E. Todorov, T. Erez, Y. Tassa - 2012
3 papers in library cite
Greg Brockman, V. Cheung, L. Pettersson, J. Schneider, John Schulman, Jie Tang, Wojciech Zaremba - 2016
3 papers in library cite
Y. Duan, X. Chen, R. Houthooft, R. Rein, John Schulman, P. Abbeel - 2016
2 papers in library cite
S. Kakade, John Langford - 2002
1 paper in library cites
N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Zhengtao Wang, A. Eslami, M. Riedmiller - 2017
1 paper in library cites
I. Szita, A. Lorincz - 2006
1 paper in library cites
Zhengtao Wang, V. Bapst, N. Heess, V. Mnih, Rémi Munos, Koray Kavukcuoglu, N. D. Freitas - 2016
1 paper in library cites
Cited by
10
papers in your library
Cites
4
papers in your library
Read
on May 21, 2026
Your review
Tags
Paper Aliases
No aliases