2020

Learning to Summarize from Human Feedback

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

citations

Cite Score

68

AI summary

This paper introduces a method for training language models to optimize for human preferences, using a large dataset of human comparisons to train a reward model, which then fine-tunes a summarization policy with reinforcement learning, achieving state-of-the-art results on the Reddit TL;DR dataset and transferring to CNN/DM news articles.

Main Contributions

  • Demonstrated significant improvement in summary quality by training a model to optimize for human preferences using reinforcement learning.
  • Collected a large, high-quality dataset of human comparisons between summaries for training a reward model.
  • Showed that human feedback models significantly outperform larger supervised learning models and generalize well to new domains (CNN/DM news articles) without specific fine-tuning.
  • Conducted extensive empirical analyses to understand the reward model's generalization capabilities and its outperformance of ROUGE at predicting human preferences.
  • Publicly released the human feedback dataset for further research, containing 64,832 summary comparisons.

Abstract

As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about—summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts [63] and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles [22], producing summaries nearly as good as the human reference without any news-specific fine-tuning.2 We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want.

Citation Graph

Loading graph...

References [73]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020

21 papers in library cite

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - 2017

10 papers in library cite

Yoshua Bengio - 2010

20 papers in library cite

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Martha Lewis, Yibo Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer - 2019

6 papers in library cite

Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016

15 papers in library cite

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei - 2017

11 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

A. See, P. J. Liu, Christopher D. Manning - 2017

8 papers in library cite

Alexander M. Rush, S. Chopra, Jason Weston - 2015

13 papers in library cite

R. Paulus, Caiming Xiong, Richard Socher - 2017

7 papers in library cite

Geoffrey Irving - 2020

7 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Richard Socher - 2018

9 papers in library cite

Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg - 2018

5 papers in library cite

Paul Christiano, Buck Shlegeris, Dario Amodei - 2018

7 papers in library cite

Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019

17 papers in library cite

Ari Holtzman, J. Buys, L. Du, M. Forbes, Yejin Choi - 2019

5 papers in library cite

P. Covington, J. Adams, E. Sargin - 2016

2 papers in library cite

J. Maynez, Shashi Narayan, B. Bohnet, R. Mcdonald - 2020

6 papers in library cite

L. Dong, N. Yang, Wenyi Wang, F. Wei, Xiaodong Liu, Yuzhi Wang, Jianfeng Gao, M. Zhou, H. W. Hon - 2019

4 papers in library cite

Marc'aurelio Ranzato, S. Chopra, Michael Auli, Wojciech Zaremba - 2015

6 papers in library cite

K. Song, X. Tan, T. Qin, J. Lu, T. Y. Liu - 2019

5 papers in library cite

S. Chopra, Michael Auli, A. Rush, S. Harvard - 2016

5 papers in library cite

M. Volske, Martin Potthast, S. Syed, Benno Stein - 2017

4 papers in library cite

John Schulman, P. Moritz, Sergey Levine, M. Jordan, P. Abbeel - 2015

5 papers in library cite

B. Ibarz, Jan Leike, T. Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei - 2018

5 papers in library cite

Ethan Perez, S. Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho - 2019

4 papers in library cite

F. Bohm, Y. Gao, C. M. Meyer, O. Shapira, Ido Dagan, I. Gurevych - 2019

3 papers in library cite

B. Dorr, D. Zajic, Richard Schwartz - 2003

3 papers in library cite

B. Hancock, Antoine Bordes, P. E. Mazare, Jason Weston - 2019

3 papers in library cite

Yonghui Wu, B. Hu - 2018

3 papers in library cite

S. Welleck, I. Kulikov, S. Roller, E. Dinan, Kyunghyun Cho, Jason Weston - 2019

3 papers in library cite

N. Jaques, S. Gu, D. Bahdanau, J. M. H. Lobato, R. E. Turner, D. Eck - 2017

3 papers in library cite

W. S. Cho, Peizhao Zhang, Y. Z. Zhang, Xiang Lisa Li, M. Galley, Chris Brockett, Mingliang Wang, Jianfeng Gao - 2019

3 papers in library cite

S. Yi, R. Goel, C. Khatri, T. Chung, Behnam Hedayatnia, Anu Venkatesh, Raefer Gabriel, D. H. Tur - 2019

3 papers in library cite

N. Jaques, A. Ghandeharioun, J. H. Shen, C. Ferguson, A. Lapedriza, N. Jones, S. Gu, R. Picard - 2019

3 papers in library cite

M. Li, Jason Weston, S. Roller - 2019

2 papers in library cite

D. Bahdanau, P. Brakel, K. Xu, A. G. A. P. Goyal, Ryan Lowe, J. Pineau, Aaron Courville, Yoshua Bengio - 2016

2 papers in library cite

J. Kreutzer, S. Khadivi, E. Matusov, S. Riezler - 2018

2 papers in library cite

P. Tambwekar, M. Dhuliawala, A. Mehta, L. J. Martin, B. Harrison, M. O. Riedl - 2018

2 papers in library cite

C. Lawrence, S. Riezler - 2018

2 papers in library cite

W. Kryscinski, Nitish Shirish Keskar, B. Mccann, Caiming Xiong, Richard Socher - 2019

2 papers in library cite

J. Zhang, Y. Zhao, M. Saleh, P. J. Liu - 2019

2 papers in library cite

K. Nguyen, H. D. Iii, J. B. Graber - 2017

2 papers in library cite

Y. Gao, C. M. Meyer, M. Mesgar, I. Gurevych - 2019

2 papers in library cite

A. Chaganty, S. Mussmann, Percy Liang - 2018

2 papers in library cite

S. Ross, G. Gordon, D. Bagnell - 2011

1 paper in library cites

R. Likert - 1932

1 paper in library cites

Y. Z. Zhang, Dustin Li, Yuzhi Wang, Y. Fang, W. Xiao - 2019

1 paper in library cites

T. Joachims, L. Granka, B. Pan, H. Hembrooke, G. Gay - 2005

1 paper in library cites

B. T. Bartell, G. W. Cottrell, R. K. Belew - 1994

1 paper in library cites

Y. Dong, Y. Shen, E. Crawford, H. V. Hoof, J. C. K. Cheung - 2018

1 paper in library cites

J. Dodge, G. Ilharco, Richard Schwartz, Ali Farhadi, Hananneh Hajishirzi, N. Smith - 2020

1 paper in library cites

F. Schmidt - 2019

1 paper in library cites

T. Y. Liu - 2011

1 paper in library cites

S. Rothe, Shashi Narayan, A. Severyn - 2020

1 paper in library cites

T. Joachims - 2002

1 paper in library cites

N. Fuhr - 1989

1 paper in library cites

T. Niu, Mohit Bansal - 2018

1 paper in library cites

Y. Yan, W. Qi, Y. Gong, D. Liu, N. Duan, Jixuan Chen, Robert Zhang, M. Zhou - 2020

1 paper in library cites

H. J. Jeon, S. Milli, A. D. Dragan - 2020

1 paper in library cites

S. Cabi, S. G. Colmenarejo, A. Novikov, K. Konyushkova, S. Reed, R. Jeong, K. Zolna, Y. Aytar, D. Budden, M. Vecerik - 2019

1 paper in library cites

D. R. Reddy - 1977

1 paper in library cites

Sanja Fidler - 2017

1 paper in library cites

N. Schluter - 2017

1 paper in library cites

Haowei Zhang, D. Duckworth, Daphne Ippolito, Arvind Neelakantan - 2020

1 paper in library cites

N. Jaques, S. Gu, R. E. Turner, D. Eck - 2017

1 paper in library cites

Cited by

10

papers in your library

Cites

28

papers in your library

Read

on May 21, 2026

Your review

Tags

Vetto StudyRLHF

Paper Aliases

No aliases