2017

Deep Reinforcement Learning From Human Preferences

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

citations

Cite Score

79

AI summary

This paper introduces a deep reinforcement learning approach that uses non-expert human preferences between trajectory segments to solve complex RL tasks, including Atari games and simulated robot locomotion, with minimal human feedback.

Main Contributions

  • Demonstrates training deep RL systems using human preferences as a reward signal, significantly reducing the cost of human oversight.
  • Shows the ability to learn complex tasks like Atari games and simulated robot locomotion without direct access to a reward function.
  • Introduces an approach where human feedback is provided on less than 1% of the agent's interactions, making it practically applicable to state-of-the-art RL systems.
  • Successfully trains complex novel behaviors (e.g., robot backflips) with approximately one hour of human input.
  • Highlights the importance of online feedback collection to improve performance and prevent exploitation of learned reward function weaknesses.

Abstract

For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than 1% of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any which have been previously learned from human feedback.

Citation Graph

Loading graph...

References [45]

Sort:
Filter:

V. Mnih - 2015

9 papers in library cite

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, Ilya Sutskever, T. Lillicrap, M. Leach, Koray Kavukcuoglu, T. Graepel, Demis Hassabis - 2016

5 papers in library cite

M. Abadi, Akshat Agarwal, P. Barham, E. Brevdo, Ziru Chen, C. Citro, G. Corrado, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, I. Goodfellow, A. Harp, Geoffrey Irving, M. Isard, Y. Jia, R. Jozefowicz, Lukasz Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, Christopher Olah, M. Schuster, J. Shlens, B. Steiner, Ilya Sutskever, K. Talwar, P. Tucker, Vincent Vanhoucke, V. Vasudevan, F. Viegas, Oriol Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2015

11 papers in library cite

John Schulman, Sergey Levine, P. Abbeel, Michael I. Jordan, P. Moritz - 2015

4 papers in library cite

Andrew Y. Ng, S. Russell - 2000

3 papers in library cite

Dario Amodei, Christopher Olah, Jacob Steinhardt, Paul Christiano, John Schulman, D. Mane - 2016

6 papers in library cite

J. Ho, Stefano Ermon - 2016

3 papers in library cite

N. Bostrom - 2014

5 papers in library cite

M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling - 2013

5 papers in library cite

V. Mnih, A. P. Badia, M. Mirza, Alex Graves, T. Lillicrap, T. Harley, D. Silver, Koray Kavukcuoglu - 2016

3 papers in library cite

D. H. Menell, S. Russell, P. Abbeel, A. D. Dragan - 2016

3 papers in library cite

Chelsea Finn, Sergey Levine, P. Abbeel - 2016

3 papers in library cite

J. Macglashan, M. K. Ho, R. Loftin, B. Peng, D. Roberts, M. E. Taylor, M. L. Littman - 2017

3 papers in library cite

W. B. Knox, P. Stone - 2009

3 papers in library cite

S. I. Wang, Percy Liang, Christopher D. Manning - 2016

3 papers in library cite

E. Todorov, T. Erez, Y. Tassa - 2012

3 papers in library cite

Greg Brockman, V. Cheung, L. Pettersson, J. Schneider, John Schulman, Jie Tang, Wojciech Zaremba - 2016

3 papers in library cite

David Krueger, Jan Leike, Owain Evans, J. Salvatier - 2016

2 papers in library cite

R. Akrour, M. Schoenauer, M. Sebag - 2012

2 papers in library cite

R. D. Luce - 2005

2 papers in library cite

W. B. Knox - 2012

2 papers in library cite

P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey, Richard S. Sutton - 2011

2 papers in library cite

V. Mnih, Koray Kavukcuoglu, D. Silver, Alex Graves, I. Antonoglou, Daan Wierstra, M. Riedmiller - 2013

2 papers in library cite

J. Furnkranz, Eyke Hüllermeier, W. C. Cheng, S. H. Park - 2012

2 papers in library cite

R. Akrour, M. Schoenauer, M. Sebag, J. C. Souplet - 2014

2 papers in library cite

R. A. Bradley, M. E. Terry - 1952

2 papers in library cite

L. E. Asri, Bilal Piot, M. Geist, R. Laroche, O. Pietquin - 2016

2 papers in library cite

B. C. Stadie, P. Abbeel, Ilya Sutskever - 2017

2 papers in library cite

A. Wilson, A. Fern, P. Tadepalli - 2012

1 paper in library cites

E. Brochu, T. Brochu, N. D. Freitas - 2010

1 paper in library cites

C. Daniel, M. Viering, J. Metz, O. Kroemer, J. Peters - 2014

1 paper in library cites

C. Daniel, O. Kroemer, M. Viering, J. Metz, J. Peters - 2015

1 paper in library cites

P. D. Sorensen, J. M. Olsen, S. Risi - 2016

1 paper in library cites

Chelsea Finn, Tao Yu, J. Fu, P. Abbeel, Sergey Levine - 2017

1 paper in library cites

A. Machwe, I. Parmee - 2006

1 paper in library cites

T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, Bilal Piot, A. Sendonaris, G. D. Arnold, I. Osband, J. Agapiou, J. Z. Leibo, A. Gruslys - 2017

1 paper in library cites

W. B. Knox, P. Stone - 2013

1 paper in library cites

Author name contains 'et al'

C. Wirth, J. Furnkranz, G. Neumann, et al - 2016

1 paper in library cites

J. Secretan, N. Beato, D. B. D. Ambrosio, A. Rodriguez, A. Campbell, K. O. Stanley - 2008

1 paper in library cites

R. Akrour, M. Schoenauer, M. Sebag - 2011

1 paper in library cites

C. Wirth, J. Furnkranz - 2013

1 paper in library cites

H. Sugiyama, T. Meguro, Y. Minami - 2012

1 paper in library cites

S. Russell - 2016

1 paper in library cites

A. Elo - 1978

1 paper in library cites

Cited by

11

papers in your library

Cites

7

papers in your library

Read

on May 15, 2026

Your review

Tags

RLHF

Paper Aliases

No aliases