2017
Cite Score
79
AI summary
This paper introduces a deep reinforcement learning approach that uses non-expert human preferences between trajectory segments to solve complex RL tasks, including Atari games and simulated robot locomotion, with minimal human feedback.
Main Contributions
Abstract
For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than 1% of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any which have been previously learned from human feedback.
Citation Graph
References [45]
V. Mnih - 2015
9 papers in library cite
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, Ilya Sutskever, T. Lillicrap, M. Leach, Koray Kavukcuoglu, T. Graepel, Demis Hassabis - 2016
5 papers in library cite
M. Abadi, Akshat Agarwal, P. Barham, E. Brevdo, Ziru Chen, C. Citro, G. Corrado, A. Davis, Jeffrey Dean, M. Devin, Sanjay Ghemawat, I. Goodfellow, A. Harp, Geoffrey Irving, M. Isard, Y. Jia, R. Jozefowicz, Lukasz Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, Christopher Olah, M. Schuster, J. Shlens, B. Steiner, Ilya Sutskever, K. Talwar, P. Tucker, Vincent Vanhoucke, V. Vasudevan, F. Viegas, Oriol Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, Xiaoqiang Zheng - 2015
11 papers in library cite
John Schulman, Sergey Levine, P. Abbeel, Michael I. Jordan, P. Moritz - 2015
4 papers in library cite
Andrew Y. Ng, S. Russell - 2000
3 papers in library cite
Dario Amodei, Christopher Olah, Jacob Steinhardt, Paul Christiano, John Schulman, D. Mane - 2016
6 papers in library cite
J. Ho, Stefano Ermon - 2016
3 papers in library cite
N. Bostrom - 2014
5 papers in library cite
M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling - 2013
5 papers in library cite
V. Mnih, A. P. Badia, M. Mirza, Alex Graves, T. Lillicrap, T. Harley, D. Silver, Koray Kavukcuoglu - 2016
3 papers in library cite
D. H. Menell, S. Russell, P. Abbeel, A. D. Dragan - 2016
3 papers in library cite
Chelsea Finn, Sergey Levine, P. Abbeel - 2016
3 papers in library cite
J. Macglashan, M. K. Ho, R. Loftin, B. Peng, D. Roberts, M. E. Taylor, M. L. Littman - 2017
3 papers in library cite
W. B. Knox, P. Stone - 2009
3 papers in library cite
S. I. Wang, Percy Liang, Christopher D. Manning - 2016
3 papers in library cite
E. Todorov, T. Erez, Y. Tassa - 2012
3 papers in library cite
Greg Brockman, V. Cheung, L. Pettersson, J. Schneider, John Schulman, Jie Tang, Wojciech Zaremba - 2016
3 papers in library cite
David Krueger, Jan Leike, Owain Evans, J. Salvatier - 2016
2 papers in library cite
R. Akrour, M. Schoenauer, M. Sebag - 2012
2 papers in library cite
R. D. Luce - 2005
2 papers in library cite
W. B. Knox - 2012
2 papers in library cite
P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey, Richard S. Sutton - 2011
2 papers in library cite
V. Mnih, Koray Kavukcuoglu, D. Silver, Alex Graves, I. Antonoglou, Daan Wierstra, M. Riedmiller - 2013
2 papers in library cite
J. Furnkranz, Eyke Hüllermeier, W. C. Cheng, S. H. Park - 2012
2 papers in library cite
R. Akrour, M. Schoenauer, M. Sebag, J. C. Souplet - 2014
2 papers in library cite
R. A. Bradley, M. E. Terry - 1952
2 papers in library cite
L. E. Asri, Bilal Piot, M. Geist, R. Laroche, O. Pietquin - 2016
2 papers in library cite
B. C. Stadie, P. Abbeel, Ilya Sutskever - 2017
2 papers in library cite
A. Wilson, A. Fern, P. Tadepalli - 2012
1 paper in library cites
E. Brochu, T. Brochu, N. D. Freitas - 2010
1 paper in library cites
C. Daniel, M. Viering, J. Metz, O. Kroemer, J. Peters - 2014
1 paper in library cites
C. Daniel, O. Kroemer, M. Viering, J. Metz, J. Peters - 2015
1 paper in library cites
P. D. Sorensen, J. M. Olsen, S. Risi - 2016
1 paper in library cites
Chelsea Finn, Tao Yu, J. Fu, P. Abbeel, Sergey Levine - 2017
1 paper in library cites
A. Machwe, I. Parmee - 2006
1 paper in library cites
T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, Bilal Piot, A. Sendonaris, G. D. Arnold, I. Osband, J. Agapiou, J. Z. Leibo, A. Gruslys - 2017
1 paper in library cites
W. B. Knox, P. Stone - 2013
1 paper in library cites
C. Wirth, J. Furnkranz, G. Neumann, et al - 2016
1 paper in library cites
J. Secretan, N. Beato, D. B. D. Ambrosio, A. Rodriguez, A. Campbell, K. O. Stanley - 2008
1 paper in library cites
R. Akrour, M. Schoenauer, M. Sebag - 2011
1 paper in library cites
C. Wirth, J. Furnkranz - 2013
1 paper in library cites
H. Sugiyama, T. Meguro, Y. Minami - 2012
1 paper in library cites
S. Russell - 2016
1 paper in library cites
R. N. Shepard - 1957
1 paper in library cites
A. Elo - 1978
1 paper in library cites
Cited by
11
papers in your library
Cites
7
papers in your library
Read
on May 15, 2026
Your review
Tags
Paper Aliases
No aliases