2021

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba

citations

Cite Score

84

AI summary

This paper introduces Codex, a GPT language model fine-tuned on publicly available code from GitHub, demonstrating strong Python code-writing capabilities by solving 28.8% of problems on HumanEval, a new evaluation set, and 70.2% with repeated sampling. Its descendants power GitHub Copilot and the OpenAI API.

Main Contributions

  • Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub for Python code-writing.
  • Released HumanEval, a new evaluation set to measure functional correctness for synthesizing programs from docstrings.
  • Codex solved 28.8% of HumanEval problems, outperforming GPT-3 (0%) and GPT-J (11.4%).
  • Demonstrated repeated sampling as an effective strategy, solving 70.2% of problems with 100 samples per problem.
  • Discussed limitations of Codex, including difficulties with long docstrings and variable binding, and potential broader impacts.

Abstract

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

Citation Graph

Loading graph...

References [123]

Sort:
Filter:

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020

21 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019

17 papers in library cite

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei - 2020

12 papers in library cite

Alex Graves - 2013

27 papers in library cite

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano - 2020

10 papers in library cite

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015

18 papers in library cite

Jason Weston, S. Chopra, Antoine Bordes - 2015

18 papers in library cite

Oren Etzioni - 2019

4 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Lukasz Kaiser, Ilya Sutskever - 2016

5 papers in library cite

Alec Radford, J. W. Kim, C. Hallacy, Aditya Ramesh, G. Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark - 2021

2 papers in library cite

Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019

17 papers in library cite

E. M. Bender, T. Gebru, Angelina McMillan-Major, S. Shmitchell - 2021

5 papers in library cite

Missing author list

2021

1 paper in library cites

Ari Holtzman, J. Buys, L. Du, M. Forbes, Yejin Choi - 2019

5 papers in library cite

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen - 2020

4 papers in library cite

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever - 2019

5 papers in library cite

Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, E. Guo, Collin Burns, S. Puranik, He He, D. X. Song, Jacob Steinhardt - 2021

4 papers in library cite

Wojciech Zaremba, Ilya Sutskever - 2014

8 papers in library cite

A. V. D. Oord, N. Kalchbrenner, Koray Kavukcuoglu - 2016

3 papers in library cite

W. Ling, Edward Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, Feng Wang, Phil Blunsom - 2016

3 papers in library cite

M. Goldblum, D. Tsipras, C. Xie, X. Chen, A. Schwarzschild, Dawn Song, A. Madry, Boxuan Li, T. Goldstein - 2021

1 paper in library cites

Mostafa Dehghani, S. Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser - 2018

6 papers in library cite

Nitish Shirish Keskar, B. Mccann, L. R. Varshney, Caiming Xiong, Richard Socher - 2019

4 papers in library cite

R. Schuster, C. Song, E. Tromer, V. Shmatikov - 2020

2 papers in library cite

S. L. Blodgett, S. Barocas, H. D. Iii, H. Wallach - 2020

7 papers in library cite

Alex Graves, G. Wayne, M. Reynolds, T. Harley, Ivo Danihelka, A. G. Barwinska, S. G. Colmenarejo, Edward Grefenstette, T. Ramalho, J. Agapiou, A. P. Badia, K. M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, Phil Blunsom, Koray Kavukcuoglu, Demis Hassabis - 2016

5 papers in library cite

Z. Kenton, Tom Everitt, L. Weidinger, I. Gabriel, V. Mikulik, Geoffrey Irving - 2021

4 papers in library cite

Nicholas Carlini, F. Tramer, E. Wallace, M. Jagielski, Ariel Herbert-Voss, K. Lee, A. Roberts, Tom B. Brown, Dawn Song, U. Erlingsson - 2021

4 papers in library cite

S. Gulwani - 2011

3 papers in library cite

Zhuoye Zhao, E. Wallace, S. Feng, Dan Klein, Shivalika Singh - 2021

3 papers in library cite

M. Balog, A. L. Gaunt, M. Brockschmidt, S. Nowozin, D. Tarlow - 2017

3 papers in library cite

B. Wang, A. Komatsuzaki - 2021

3 papers in library cite

K. Crawford - 2017

3 papers in library cite

P. Yin, Graham Neubig - 2017

2 papers in library cite

D. Patterson, Joseph Gonzalez, Quoc Le, C. Liang, L. M. Munguia, D. Rothchild, D. So, M. Texier, Jeffrey Dean - 2021

2 papers in library cite

U. Alon, S. Brody, Omer Levy, E. Yahav - 2018

2 papers in library cite

Z. Feng, Daniel Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, M. Zhou - 2020

2 papers in library cite

H. Husain, H. H. Wu, T. Gazit, M. Allamanis, M. Brockschmidt - 2019

2 papers in library cite

S. Lu, Daniel Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, Dawn Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, Shuming Liu - 2021

2 papers in library cite

Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, D. Luan, Ilya Sutskever - 2020

2 papers in library cite

S. Black, Leo Gao, Peng Wang, C. Leahy, Stella Biderman - 2021

2 papers in library cite

S. Reed, N. D. Freitas - 2015

2 papers in library cite

A. Hindle, E. Barr, Z. Su, P. Devanbu, M. Gable - 2012

2 papers in library cite

Abubakar Abid, Maheen Farooqi, James Zou - 2021

2 papers in library cite

C. B. Clement, Dawn Drain, J. Timcheck, A. Svyatkovskiy, N. Sundaresan - 2020

2 papers in library cite

Jacob Devlin, Jonathan Uesato, S. Bhupatiraju, R. Singh, Abdel Rahman Mohamed, P. Kohli - 2017

2 papers in library cite

S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, Percy Liang - 2019

2 papers in library cite

S. Gulwani, W. R. Harris, R. Singh - 2012

2 papers in library cite

C. J. Maddison, D. Tarlow - 2014

2 papers in library cite

Z. Manna, R. J. Waldinger - 1971

2 papers in library cite

B. Roziere, M. A. Lachaux, L. Chanussot, G. Lample - 2020

2 papers in library cite

A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. Moura, D. Parikh, D. Batra - 2017

2 papers in library cite

A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, Oriol Vinyals, Alex Graves, N. Kalchbrenner, A. Senior, Koray Kavukcuoglu - 2016

2 papers in library cite

Missing author list

2020

1 paper in library cites

A. Ziegler - 2021

1 paper in library cites

C. L. Goues, M. D. Vogt, S. Forrest, W. Weimer - 2012

1 paper in library cites

Missing author list

2017

1 paper in library cites

Z. Qi, F. Long, S. Achour, M. Rinard - 2015

1 paper in library cites

M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, D. Poshyvanyk - 2019

1 paper in library cites

B. Korel, J. Rilling - 1997

1 paper in library cites

K. Crawford - 2021

1 paper in library cites

M. O'neill, L. Spector - 2019

1 paper in library cites

M. Ohm, H. Plate, A. Sykosch, M. Meier - 2020

1 paper in library cites

H. Bao, L. Dong, F. Wei - 2021

1 paper in library cites

M. Allamanis, D. Tarlow, A. Gordon, Yixuan Wei - 2015

1 paper in library cites

A. Rives, J. Meier, T. Sercu, S. Goyal, Zongyu Lin, Joseph Liu, Daniel Guo, M. Ott, C. L. Zitnick, J. Ma - 2021

1 paper in library cites

Missing author list

2021

1 paper in library cites

D. Jeffrey, M. Feng, N. Gupta, R. Gupta - 2009

1 paper in library cites

Paul Christiano - 2018

1 paper in library cites

S. Ren, Daniel Guo, S. Lu, L. Zhou, Shuming Liu, D. Tang, N. Sundaresan, M. Zhou, A. Blanco, S. Ma - 2020

1 paper in library cites

C. O'keefe, D. Lansky, Jack Clark, C. Payne - 2019

1 paper in library cites

Missing author list

2021

1 paper in library cites

P. Jain, A. Jain, Tong Zhang, P. Abbeel, Joseph Gonzalez, Ion Stoica - 2020

1 paper in library cites

H. A. Simon - 1963

1 paper in library cites

H. Agrawal, J. R. Horgan, S. London, W. E. Wong - 1995

1 paper in library cites

M. Woolf - 2021

1 paper in library cites

T. Helmuth, L. Spector - 2015

1 paper in library cites

Dawn Drain, Chiyu Wu, A. Svyatkovskiy, N. Sundaresan - 2021

1 paper in library cites

J. Menick, N. Kalchbrenner - 2018

1 paper in library cites

J. R. Koza, D. Andre, M. A. Keane, F. H. B. Iii - 1999

1 paper in library cites

Arul Menezes, P. V. Oorschot, S. Vanstone - 2018

1 paper in library cites

E. C. Shin, Illia Polosukhin, Dawn Song - 2018

1 paper in library cites

N. Leveson - 2019

1 paper in library cites

F. F. Xu, B. Vasilescu, Graham Neubig - 2021

1 paper in library cites

Prafulla Dhariwal, Heewoo Jun, C. Payne, J. W. Kim, Alec Radford, Ilya Sutskever - 2020

1 paper in library cites

G. A. Aye, S. Kim, H. Li - 2021

1 paper in library cites

T. Pierrot, G. Ligner, S. Reed, O. Sigaud, N. Perrin, A. Laterre, D. Kas, K. Beguir, N. D. Freitas - 2021

1 paper in library cites

I. M. Barrington, A. Maciel - 2000

1 paper in library cites

Rowan Zellers, X. Lu, J. Hessel, Y. Yu, J. S. Park, J. Cao, F. Farhadi, A. Choi, Y. Merlot - 2021

1 paper in library cites

E. Pantridge, T. Helmuth, N. F. Mcphee, L. Spector - 2017

1 paper in library cites

N. Lacasse - 2018

1 paper in library cites

B. Davis - 2018

1 paper in library cites

Missing author list

2020

1 paper in library cites

E. Masanet, A. Shehabi, N. Lei, S. Smith, J. Koomey - 2020

1 paper in library cites

A. V. D. Oord, Yiwei Li, Oriol Vinyals - 2018

1 paper in library cites

D. Acemoglu, P. Restrepo - 2020

1 paper in library cites

Missing author list

2021

1 paper in library cites

M. O. F. Rokon, R. Islam, A. Darki, E. E. Papalexakis, M. Faloutsos - 2020

1 paper in library cites

M. R. Clarkson, B. Finkbeiner, M. Koleini, K. K. Micinski, M. N. Rabe, C. Sanchez - 2014

1 paper in library cites

Missing author list

2002

1 paper in library cites

C. Jones, O. Bonsignour - 2011

1 paper in library cites

Leo Gao, Stella Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, Jason Phang, He He, A. Thite, N. Nabeshima, S. Presser, C. Leahy - 2020

1 paper in library cites

C. B. Frey - 2019

1 paper in library cites

D. Acemoglu, P. Restrepo - 2020

1 paper in library cites

E. C. Alley, G. Khimulya, S. Biswas, M. Alquraishi, G. M. Church - 2019

1 paper in library cites

M. Tufano, Dawn Drain, A. Svyatkovskiy, S. K. Deng, N. Sundaresan - 2020

1 paper in library cites

Missing author list

2006

1 paper in library cites

Missing author list

2009

1 paper in library cites

J. Lu, D. Batra, D. Parikh, S. Lee - 2019

1 paper in library cites

A. Baevski, H. Zhou, A. Mohamed, Michael Auli - 2020

1 paper in library cites

P. L. Li, A. J. Ko, A. Begel - 2020

1 paper in library cites

B. Trinkenreich, I. Wiese, A. Sarma, M. Gerosa, I. Steinmacher - 2021

1 paper in library cites

N. Eghbal - 2020

1 paper in library cites

Cited by

9

papers in your library

Cites

32

papers in your library

Read

on June 2, 2026

Your review

Tags

canonBenchmark

Paper Aliases

No aliases