Cite Score
55
AI summary
This paper presents a reward learning approach to fine-tune large language models using human preferences on text continuations. The models were evaluated on sentiment, descriptiveness, and summarization tasks using the BookCorpus, CNN/Daily Mail, and TL;DR datasets. The results show improved performance with models trained on human feedback.
Main Contributions
Abstract
Reward learning enables the application of reinforcement learning (RL) to tasks where reward is defined by human judgment, building a model of reward by asking humans questions. Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks. In this paper, we build on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets. For stylistic continuation we achieve good results with only 5,000 comparisons evaluated by humans. For summarization, models trained with 60,000 comparisons copy whole sentences from the input but skip irrelevant preamble; this leads to reasonable ROUGE scores and very good performance according to our human labelers, but may be exploiting the fact that labelers rely on simple heuristics.
Citation Graph
References [47]
D. P. Kingma, Jimmy Lei Ba - 2014
49 papers in library cite
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - 2017
10 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018
27 papers in library cite
Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018
23 papers in library cite
R. Sennrich, B. Haddow, Alexandra Birch - 2016
22 papers in library cite
Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016
15 papers in library cite
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei - 2017
11 papers in library cite
J. Howard, Sebastian Ruder - 2018
14 papers in library cite
K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015
31 papers in library cite
A. See, P. J. Liu, Christopher D. Manning - 2017
8 papers in library cite
Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015
18 papers in library cite
R. Paulus, Caiming Xiong, Richard Socher - 2017
7 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg - 2018
5 papers in library cite
Alec Radford, R. Jozefowicz, Ilya Sutskever - 2017
8 papers in library cite
Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktaschel - 2019
3 papers in library cite
Paul Christiano, Buck Shlegeris, Dario Amodei - 2018
7 papers in library cite
M. Volske, Martin Potthast, S. Syed, Benno Stein - 2017
4 papers in library cite
Geoffrey Irving, Paul Christiano, Dario Amodei - 2018
8 papers in library cite
D. Bahdanau, F. Hill, Jan Leike, E. Hughes, P. Kohli, Edward Grefenstette - 2019
4 papers in library cite
B. Ibarz, Jan Leike, T. Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei - 2018
5 papers in library cite
Ethan Perez, S. Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho - 2019
4 papers in library cite
F. Bohm, Y. Gao, C. M. Meyer, O. Shapira, Ido Dagan, I. Gurevych - 2019
3 papers in library cite
B. Hancock, Antoine Bordes, P. E. Mazare, Jason Weston - 2019
3 papers in library cite
Yonghui Wu, B. Hu - 2018
3 papers in library cite
J. Kreutzer, J. Uyheng, S. Riezler - 2018
3 papers in library cite
N. Jaques, S. Gu, D. Bahdanau, J. M. H. Lobato, R. E. Turner, D. Eck - 2017
3 papers in library cite
W. S. Cho, Peizhao Zhang, Y. Z. Zhang, Xiang Lisa Li, M. Galley, Chris Brockett, Mingliang Wang, Jianfeng Gao - 2019
3 papers in library cite
S. Yi, R. Goel, C. Khatri, T. Chung, Behnam Hedayatnia, Anu Venkatesh, Raefer Gabriel, D. H. Tur - 2019
3 papers in library cite
N. Jaques, A. Ghandeharioun, J. H. Shen, C. Ferguson, A. Lapedriza, N. Jones, S. Gu, R. Picard - 2019
3 papers in library cite
Sebastian Gehrmann, Y. Deng, Alexander M. Rush - 2018
2 papers in library cite
P. Tambwekar, M. Dhuliawala, A. Mehta, L. J. Martin, B. Harrison, M. O. Riedl - 2018
2 papers in library cite
W. Kryscinski, Nitish Shirish Keskar, B. Mccann, Caiming Xiong, Richard Socher - 2019
2 papers in library cite
K. Nguyen, H. D. Iii, J. B. Graber - 2017
2 papers in library cite
Y. Gao, C. M. Meyer, M. Mesgar, I. Gurevych - 2019
2 papers in library cite
J. Huang, Rewon Child, V. Rao, Haozhe Liu, S. Satheesh, A. Coates - 2016
1 paper in library cites
J. T. Ash, Chiyuan Zhang, A. Krishnamurthy, John Langford, Akshat Agarwal - 2019
1 paper in library cites
Jeffrey Li, A. H. Miller, S. Chopra, Marc'aurelio Ranzato, Jason Weston - 2016
1 paper in library cites
D. Gissin, S. S. Shwartz - 2019
1 paper in library cites
Y. Guo, Dale Schuurmans - 2008
1 paper in library cites
Sebastian Gehrmann, Z. Ziegler, A. Rush - 2019
1 paper in library cites
J. Mcauley, C. Targett, Q. Shi, A. V. D. Hengel - 2015
1 paper in library cites
S. Sidor, Yonghui Wu, P. Zhokhov - 2017
1 paper in library cites
Y. Gao, C. M. Meyer, I. Gurevych - 2019
1 paper in library cites
U. Khandelwal, K. Clark, Dan Jurafsky, Lukasz Kaiser - 2019
1 paper in library cites
Cited by
7
papers in your library
Cites
22
papers in your library
Read
on November 22, 2025
Your review
Tags
Paper Aliases
No aliases