2017

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

citations

Cite Score

96

AI summary

This paper introduces Proximal Policy Optimization (PPO), a new family of policy gradient methods that uses a novel objective function for multiple minibatch updates, empirically outperforming other online policy gradient methods on simulated robotic locomotion and Atari game playing benchmarks.

Main Contributions

  • Introduces Proximal Policy Optimization (PPO), a new family of policy gradient methods.
  • Proposes a novel objective function with clipped probability ratios that enables multiple epochs of minibatch updates per data sample.
  • PPO is simpler to implement and more general than TRPO, while offering similar benefits.
  • Empirically demonstrates PPO's superior sample complexity compared to other online policy gradient methods on benchmark tasks.
  • Achieves strong performance on simulated robotic locomotion and Atari game playing.

Abstract

We propose a new family of policy gradient methods for reinforcement learning, which al-ternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gra-dient methods perform one gradient update per data sample, we propose a novel objectivefunction that enables multiple epochs of minibatch updates. The new methods, which we callproximal policy optimization (PPO), have some of the benefits of trust region policy optimiza-tion (TRPO), but they are much simpler to implement, more general, and have better samplecomplexity (empirically). Our experiments test PPO on a collection of benchmark tasks, includ-ing simulated robotic locomotion and Atari game playing, and we show that PPO outperformsother online policy gradient methods, and overall strikes a favorable balance between samplecomplexity, simplicity, and wall-time.

Citation Graph

Loading graph...

References [14]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

V. Mnih - 2015

9 papers in library cite

R. Williams - 1992

11 papers in library cite

John Schulman, Sergey Levine, P. Abbeel, Michael I. Jordan, P. Moritz - 2015

4 papers in library cite

John Schulman, P. Moritz, Sergey Levine, M. Jordan, P. Abbeel - 2015

5 papers in library cite

M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling - 2013

5 papers in library cite

V. Mnih, A. P. Badia, M. Mirza, Alex Graves, T. Lillicrap, T. Harley, D. Silver, Koray Kavukcuoglu - 2016

3 papers in library cite

E. Todorov, T. Erez, Y. Tassa - 2012

3 papers in library cite

Greg Brockman, V. Cheung, L. Pettersson, J. Schneider, John Schulman, Jie Tang, Wojciech Zaremba - 2016

3 papers in library cite

Y. Duan, X. Chen, R. Houthooft, R. Rein, John Schulman, P. Abbeel - 2016

2 papers in library cite

S. Kakade, John Langford - 2002

1 paper in library cites

N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Zhengtao Wang, A. Eslami, M. Riedmiller - 2017

1 paper in library cites

I. Szita, A. Lorincz - 2006

1 paper in library cites

Zhengtao Wang, V. Bapst, N. Heess, V. Mnih, Rémi Munos, Koray Kavukcuoglu, N. D. Freitas - 2016

1 paper in library cites

Cited by

10

papers in your library

Cites

4

papers in your library

Read

on May 21, 2026

Your review

Tags

RLHFVetto Study

Paper Aliases

No aliases