2022
Training Language Models to Follow Instructions With Human Feedback
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe
Cite Score
93
AI summary
This paper introduces InstructGPT models, which are fine-tuned GPT-3 models using human feedback, showing improved alignment with user intent, better truthfulness, reduced toxicity, and superior performance on various NLP tasks compared to the original GPT-3, despite having significantly fewer parameters.
Main Contributions
Abstract
Making language models bigger does not inherently make them better at following a user’s intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Citation Graph
References [91]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - 2017
10 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013
24 papers in library cite
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021
9 papers in library cite
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei - 2017
11 papers in library cite
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi - 2019
6 papers in library cite
P. Rajpurkar, R. Jia, Percy Liang - 2018
14 papers in library cite
Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano - 2020
10 papers in library cite
R. Nallapati, B. Zhou, C. N. D. Santos, C. G. Gulcehre, Bing Xiang - 2016
10 papers in library cite
A. Wang, Y. Pruksachatkun, Nikita Nangia, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2019
15 papers in library cite
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeffrey Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman - 2021
7 papers in library cite
Geoffrey Irving - 2020
7 papers in library cite
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith - 2020
1 paper in library cites
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane Legg - 2018
5 papers in library cite
Paul Christiano, Buck Shlegeris, Dario Amodei - 2018
7 papers in library cite
E. M. Bender, T. Gebru, Angelina McMillan-Major, S. Shmitchell - 2021
5 papers in library cite
2021
3 papers in library cite
Jason Wei, Maarten Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Quoc V. Le - 2021
3 papers in library cite
Stephen Lin, Jacob Hilton, Owain Evans - 2022
4 papers in library cite
William Fedus, Barret Zoph, Noam Shazeer - 2022
2 papers in library cite
V. Sanh, A. Webson, Colin Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja - 2021
4 papers in library cite
L. Weidinger, J. Mellor, M. Rauh, C. Griffin, Jonathan Uesato, P. S. Huang, M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh - 2021
2 papers in library cite
I. Gabriel - 2020
1 paper in library cites
2021
4 papers in library cite
E. Choi, He He, M. Iyyer, M. Yatskar, W. T. Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer - 2018
8 papers in library cite
2021
2 papers in library cite
M. Volske, Martin Potthast, S. Syed, Benno Stein - 2017
4 papers in library cite
Geoffrey Irving, Paul Christiano, Dario Amodei - 2018
8 papers in library cite
R. Rudinger, J. Naradowsky, B. Leonard, B. V. Durme - 2018
6 papers in library cite
Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Benjamin Mann, Nova Dassarma, Nelson Elhage, Zac Hatfield Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Christopher Olah, Jared Kaplan - 2021
5 papers in library cite
R. Thoppilan, D. D. Freitas, J. Hall, Noam Shazeer, A. Kulshreshtha, H. Cheng, A. Jin, T. Bos, L. Baker, Yulun Du, Yiwei Li, Honglak Lee, H. S. Zheng, A. Ghafouri, M. Menegali, Y. Huang, M. Krikun, D. Lepikhin, J. Qin, Deli Chen, Yiheng Xu, Ziru Chen, A. Roberts, Maarten Bosma, Y. Zhou, C. C. Chang, I. Krivokon, W. Rusch, M. Pickett, K. S. M. Hellstern, M. R. Morris, T. Doshi, R. D. Santos, T. Duke, J. Soraker, B. Zevenbergen, V. Prabhakaran, M. Diaz, B. Hutchinson, K. Olson, A. Molina, E. H. John, Jaehoon Lee, L. Aroyo, R. Rajakumar, A. Butryna, M. Lamm, V. Kuzmina, J. Fenton, A. Cohen, R. Bernstein, R. Kurzweil, B. A. Arcas, C. Cui, M. Croak, E. Chi, Quoc Le - 2022
5 papers in library cite
Nitish Shirish Keskar, B. Mccann, L. R. Varshney, Caiming Xiong, Richard Socher - 2019
4 papers in library cite
D. Dua, Yuzhi Wang, P. Dasigi, G. Stanovsky, Shivalika Singh, Matt Gardner - 2019
4 papers in library cite
D. Bahdanau, F. Hill, Jan Leike, E. Hughes, P. Kohli, Edward Grefenstette - 2019
4 papers in library cite
Nikita Nangia, C. Vania, R. Bhalerao, Samuel R. Bowman - 2020
1 paper in library cites
Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hananneh Hajishirzi - 2020
5 papers in library cite
S. L. Blodgett, S. Barocas, H. D. Iii, H. Wallach - 2020
7 papers in library cite
John Schulman, P. Moritz, Sergey Levine, M. Jordan, P. Abbeel - 2015
5 papers in library cite
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, Demis Hassabis - 2017
5 papers in library cite
B. Ibarz, Jan Leike, T. Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei - 2018
5 papers in library cite
Z. Kenton, Tom Everitt, L. Weidinger, I. Gabriel, V. Mikulik, Geoffrey Irving - 2021
4 papers in library cite
Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hananneh Hajishirzi - 2021
4 papers in library cite
Nicholas Carlini, F. Tramer, E. Wallace, M. Jagielski, Ariel Herbert-Voss, K. Lee, A. Roberts, Tom B. Brown, Dawn Song, U. Erlingsson - 2021
4 papers in library cite
Ethan Perez, S. Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho - 2019
4 papers in library cite
Jiacheng Xu, D. Ju, M. Li, Y. Lan Boureau, Jason Weston, E. Dinan - 2020
4 papers in library cite
Moin Nadeem, A. Bethke, Siva Reddy - 2020
4 papers in library cite
E. Sheng, K. W. Chang, P. Natarajan, Nanyun Peng - 2019
4 papers in library cite
T. Anthony, Z. Tian, D. Barber - 2017
4 papers in library cite
F. Bohm, Y. Gao, C. M. Meyer, O. Shapira, Ido Dagan, I. Gurevych - 2019
3 papers in library cite
N. Soares, B. Fallenstein, S. Armstrong, E. Yudkowsky - 2015
3 papers in library cite
O. Bojar, R. Chatterjee, C. Federmann, B. Haddow, M. Huck, C. Hokamp, P. Koehn, V. Logacheva, C. Monz, M. Negri, M. Post, C. Scarton, L. Specia, M. Turchi - 2015
3 papers in library cite
B. Hancock, Antoine Bordes, P. E. Mazare, Jason Weston - 2019
3 papers in library cite
I. Solaiman, C. Dennison - 2021
3 papers in library cite
I. Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeffrey Wu, Alec Radford, Gretchen Krueger, J. W. Kim, S. Kreps, M. Mccain, A. Newhouse, J. Blazakis, K. Mcguffie, J. Wang - 2019
3 papers in library cite
W. S. Cho, Peizhao Zhang, Y. Z. Zhang, Xiang Lisa Li, M. Galley, Chris Brockett, Mingliang Wang, Jianfeng Gao - 2019
3 papers in library cite
S. Yi, R. Goel, C. Khatri, T. Chung, Behnam Hedayatnia, Anu Venkatesh, Raefer Gabriel, D. H. Tur - 2019
3 papers in library cite
N. Jaques, A. Ghandeharioun, J. H. Shen, C. Ferguson, A. Lapedriza, N. Jones, S. Gu, R. Picard - 2019
3 papers in library cite
Jan Leike, Miljan Martic, V. Krakovna, P. A. Ortega, Tom Everitt, A. Lefrancq, L. Orseau, Shane Legg - 2017
2 papers in library cite
D. Bahdanau, P. Brakel, K. Xu, A. G. A. P. Goyal, Ryan Lowe, J. Pineau, Aaron Courville, Yoshua Bengio - 2016
2 papers in library cite
E. Dinan, S. Humeau, B. Chintagunta, Jason Weston - 2019
2 papers in library cite
J. Kreutzer, S. Khadivi, E. Matusov, S. Riezler - 2018
2 papers in library cite
Josh Achiam, D. Held, A. Tamar, P. Abbeel - 2017
2 papers in library cite
Paul Christiano, A. Cotra, Mimee Xu - 2021
2 papers in library cite
P. Henderson, K. Sinha, N. A. Gontier, N. R. Ke, G. Fried, Ryan Lowe, J. Pineau - 2018
2 papers in library cite
C. Lawrence, S. Riezler - 2018
2 papers in library cite
W. Zhou, K. Xu - 2020
2 papers in library cite
S. Dathathri, Andrea Madotto, J. Lan, J. Hung, E. Frank, P. Molino, Jason Yosinski, Rosanne Liu - 2019
2 papers in library cite
Y. Qian, U. Muaz, B. Zhang, J. W. Hyun - 2019
2 papers in library cite
P. S. Huang, Haowei Zhang, R. Jiang, R. Stanforth, J. Welbl, J. Rae, Vishal Maini, D. Yogatama, P. Kohli - 2019
2 papers in library cite
Timo Schick, S. Udupa, Hinrich Schutze - 2021
2 papers in library cite
N. Bostrom - 2017
2 papers in library cite
A. Tamkin, Miles Brundage, Jack Clark, Deep Ganguli - 2021
2 papers in library cite
J. Dhamala, T. Sun, V. Kumar, S. Krishna, Y. Pruksachatkun, K. W. Chang, R. Gupta - 2021
1 paper in library cites
J. Welbl, A. Glaese, Jonathan Uesato, S. Dathathri, J. Mellor, L. A. Hendricks, K. Anderson, P. Kohli, B. Coppin, P. S. Huang - 2021
1 paper in library cites
A. Xu, E. Pathak, E. Wallace, Suchin Gururangan, Maarten Sap, Dan Klein - 2021
1 paper in library cites
Haozhe Liu, J. Dacon, W. Fan, Haozhe Liu, Ze Liu, Jie Tang - 2019
1 paper in library cites
V. Aribandi, Yi Tay, Tal Schuster, J. Rao, H. S. Zheng, S. V. Mehta, H. Zhuang, V. Q. Tran, D. Bahri, J. Ni - 2021
1 paper in library cites
B. Krause, A. D. Gotmare, B. Mccann, Nitish Shirish Keskar, S. Joty, Richard Socher, N. F. Rajani - 2020
1 paper in library cites
H. Kirk, Y. Jun, H. Iqbal, E. Benussi, F. Volpin, F. A. Dreyer, A. Shtedritski, Y. M. Asano - 2021
1 paper in library cites
J. Abramson, A. Ahuja, I. Barr, A. Brussee, F. Carnevale, M. Cassin, R. Chhaparia, S. Clark, B. Damoc, A. Dudzik - 2020
1 paper in library cites
J. Vig, Sebastian Gehrmann, Yonatan Belinkov, S. Qian, D. Nevo, Yoram Singer, Stuart M. Shieber - 2020
1 paper in library cites
Aman Madaan, N. Tandon, Peter Clark, Yining Yang - 2022
1 paper in library cites
H. Ngo, C. Raterink, J. G. Araujo, I. Zhang, C. C. Chen, A. Morisot, N. Frosst - 2021
1 paper in library cites
M. Zhao, P. Anderson, V. Jain, Shijie Wang, A. Ku, J. Baldridge, E. Ie - 2021
1 paper in library cites
E. Dinan, A. Fan, A. Williams, J. Urbanek, Douwe Kiela, Jason Weston - 2019
1 paper in library cites
A. Caliskan, J. J. Bryson, A. Narayanan - 2017
1 paper in library cites
D. D. V. Manela, D. Errington, T. Fisher, B. V. Breugel, P. Minervini - 2021
1 paper in library cites
Paul Pu Liang, Chiyu Wu, Louis Philippe Morency, Ruslan Salakhutdinov - 2021
1 paper in library cites
M. S. A. Nahian, S. Frazier, B. Harrison, M. Riedl - 2021
1 paper in library cites
B. Buchanan, A. Lohn, M. Musser, K. Sedova - 2021
1 paper in library cites
Cited by
11
papers in your library
Cites
36
papers in your library
Read
on May 27, 2026
Your review
Tags
Paper Aliases
No aliases