2021

Program Synthesis With Large Language Models

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le

citations

Cite Score

69

AI summary

This paper explores the limits of large language models for program synthesis, introducing two new benchmarks, MBPP and MathQA-Python, and showing that synthesis performance scales log-linearly with model size, achieving up to 83.8% accuracy on MathQA-Python with fine-tuning.

Main Contributions

  • Introduces two new datasets for Python code synthesis: Mostly Basic Programming Problems (MBPP) with 974 tasks and MathQA-Python with 23914 problems.
  • Demonstrates that large language models perform surprisingly well at few-shot synthesis of Python programs from natural language, with performance scaling log-linearly with model size (up to 59.6% on MBPP few-shot, 83.8% on MathQA-Python fine-tuned).
  • Shows that models can engage in dialog about code and improve performance with natural language feedback, halving the error rate with four turns of dialog.
  • Analyzes the semantic grounding of models, finding that even the largest models are generally unable to predict program execution output given specific inputs.
  • Investigates the impact of prompt examples and test cases on synthesis performance, and confirms that solutions typically generalize to held-out test cases with minimal overlap with pre-training data.

Abstract

This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize short Python programs from natural language descriptions. The Mostly Basic Programming Problems (MBPP) dataset contains 974 programming tasks, designed to be solvable by entry-level programmers. The MathQA-Python dataset, a Python version of the MathQA benchmark, contains 23914 problems that evaluate the ability of the models to synthesize code from more complex text. On both datasets, we find that synthesis performance scales log-linearly with model size. Our largest models, even without finetuning on a code dataset, can synthesize solutions to 59.6% of the problems from MBPP using few-shot learning with a well-designed prompt. Fine-tuning on a held-out portion of the dataset improves performance by about 10 percentage points across most model sizes. On the MathQA-Python dataset, the largest fine-tuned model achieves 83.8% accuracy. Going further, we study the model’s ability to engage in dialog about code, incorporating human feedback to improve its solutions. We find that natural language feedback from a human halves the error rate compared to the model’s initial prediction. Additionally, we conduct an error analysis to shed light on where these models fall short and what types of programs are most difficult to generate. Finally, we explore the semantic grounding of these models by fine-tuning them to predict the results of program execution. We find that even our best models are generally unable to predict the output of a program given a specific input.

Citation Graph

Loading graph...

References [102]

Sort:
Filter:

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020

21 papers in library cite

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021

9 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

J. Howard, Sebastian Ruder - 2018

14 papers in library cite

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016

20 papers in library cite

Lukasz Kaiser, Ilya Sutskever - 2016

5 papers in library cite

Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019

17 papers in library cite

E. M. Bender, T. Gebru, Angelina McMillan-Major, S. Shmitchell - 2021

5 papers in library cite

Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, E. Guo, Collin Burns, S. Puranik, He He, D. X. Song, Jacob Steinhardt - 2021

4 papers in library cite

A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019

6 papers in library cite

Wojciech Zaremba, Ilya Sutskever - 2014

8 papers in library cite

W. Ling, Edward Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, Feng Wang, Phil Blunsom - 2016

3 papers in library cite

R. Schuster, C. Song, E. Tromer, V. Shmatikov - 2020

2 papers in library cite

Alex Graves, G. Wayne, M. Reynolds, T. Harley, Ivo Danihelka, A. G. Barwinska, S. G. Colmenarejo, Edward Grefenstette, T. Ramalho, J. Agapiou, A. P. Badia, K. M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, Phil Blunsom, Koray Kavukcuoglu, Demis Hassabis - 2016

5 papers in library cite

Nicholas Carlini, F. Tramer, E. Wallace, M. Jagielski, Ariel Herbert-Voss, K. Lee, A. Roberts, Tom B. Brown, Dawn Song, U. Erlingsson - 2021

4 papers in library cite

S. Gulwani - 2011

3 papers in library cite

E. M. Bender, A. Koller - 2020

3 papers in library cite

M. Balog, A. L. Gaunt, M. Brockschmidt, S. Nowozin, D. Tarlow - 2017

3 papers in library cite

M. Allamanis, E. T. Barr, P. Devanbu, Charles Sutton - 2018

2 papers in library cite

P. Yin, Graham Neubig - 2017

2 papers in library cite

Z. Feng, Daniel Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, M. Zhou - 2020

2 papers in library cite

H. Husain, H. H. Wu, T. Gazit, M. Allamanis, M. Brockschmidt - 2019

2 papers in library cite

S. Lu, Daniel Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, Dawn Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, Shuming Liu - 2021

2 papers in library cite

X. Chen, C. L. Liu, Dawn Song - 2019

2 papers in library cite

S. Black, Leo Gao, Peng Wang, C. Leahy, Stella Biderman - 2021

2 papers in library cite

D. E. Shaw, W. R. Swartout, C. C. Green - 1975

2 papers in library cite

A. Hindle, E. Barr, Z. Su, P. Devanbu, M. Gable - 2012

2 papers in library cite

S. Gulwani, O. Polozov, R. Singh - 2017

2 papers in library cite

Tal Schuster, A. Kalyan, O. Polozov, A. T. Kalai - 2021

2 papers in library cite

C. B. Clement, Dawn Drain, J. Timcheck, A. Svyatkovskiy, N. Sundaresan - 2020

2 papers in library cite

Jacob Devlin, Jonathan Uesato, S. Bhupatiraju, R. Singh, Abdel Rahman Mohamed, P. Kohli - 2017

2 papers in library cite

S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, Percy Liang - 2019

2 papers in library cite

C. J. Maddison, D. Tarlow - 2014

2 papers in library cite

B. Lester, R. A. Rfou, Noah Constant - 2021

2 papers in library cite

Z. Manna, R. J. Waldinger - 1971

2 papers in library cite

B. Roziere, M. A. Lachaux, L. Chanussot, G. Lample - 2020

2 papers in library cite

M. Allamanis, H. Peng, Charles Sutton - 2016

1 paper in library cites

F. Alet, J. L. Contreras, James Koppel, Maxwell Nye, A. S. Lezama, T. L. Perez, L. Kaelbling, J. Tenenbaum - 2021

1 paper in library cites

P. D. Summers - 1977

1 paper in library cites

M. Allamanis - 2021

1 paper in library cites

B. J. Copeland - 2012

1 paper in library cites

F. Long, M. Rinard - 2016

1 paper in library cites

A. Zohar, Lior Wolf - 2018

1 paper in library cites

R. M. Karampatsis, H. Babii, R. Robbes, Charles Sutton, A. Janes - 2020

1 paper in library cites

Augustus Odena, K. Shi, D. Bieber, R. Singh, Charles Sutton - 2020

1 paper in library cites

V. Raychev, M. Vechev, E. Yahav - 2014

1 paper in library cites

S. Kim, J. Zhao, Yuandong Tian, S. Chandra - 2020

1 paper in library cites

U. Alon, M. Zilberstein, Omer Levy, E. Yahav - 2019

1 paper in library cites

A. S. Lezama, L. Tancau, R. Bodik, S. A. Seshia, V. A. Saraswat - 2006

1 paper in library cites

S. Panthaplackel, J. J. Li, M. Gligoric, R. J. Mooney - 2021

1 paper in library cites

V. J. Hellendoorn, C. Bird, E. T. Barr, M. Allamanis - 2018

1 paper in library cites

M. Pradel, Koushik Sen - 2018

1 paper in library cites

K. Ellis, C. Wong, M. I. Nye, M. S. Meyer, L. Cary, L. Morales, L. B. Hewitt, A. S. Lezama, Joshua B. Tenenbaum - 2020

1 paper in library cites

Ellen Jiang, E. Toh, A. Molina, A. Donsbach, Carrie Cai, Michael Terry - 2021

1 paper in library cites

C. L. Goues, T. N. Nguyen, S. Forrest, W. Weimer - 2012

1 paper in library cites

V. J. Hellendoorn, Charles Sutton, R. Singh, P. Maniatis, D. Bieber - 2019

1 paper in library cites

Michihiro Yasunaga, Percy Liang - 2020

1 paper in library cites

E. Torlak, R. Bodik - 2013

1 paper in library cites

E. Dinella, H. Dai, Zhiyuan Li, M. Naik, L. Song, K. Wang - 2019

1 paper in library cites

B. Z. Li, Maxwell Nye, Jacob Andreas - 2021

1 paper in library cites

A. Svyatkovskiy, S. K. Deng, S. Fu, N. Sundaresan - 2020

1 paper in library cites

A. S. Lezama - 2018

1 paper in library cites

Z. Manna, R. Waldinger - 1975

1 paper in library cites

Jason Wei, M. Goyal, G. Durrett, I. Dillig - 2020

1 paper in library cites

A. Kanade, P. Maniatis, G. Balakrishnan, K. Shi - 2020

1 paper in library cites

K. Ellis, L. Morales, M. S. Meyer, A. S. Lezama, J. Tenenbaum - 2018

1 paper in library cites

M. Allamanis, E. T. Barr, C. Bird, Charles Sutton - 2014

1 paper in library cites

D. Bieber, Charles Sutton, Hugo Larochelle, D. Tarlow - 2020

1 paper in library cites

D. Tarlow, S. Moitra, A. Rice, Ziru Chen, Pierre Antoine Manzagol, Charles Sutton, E. Aftandilian - 2019

1 paper in library cites

M. Allamanis, M. Brockschmidt, M. Khademi - 2018

1 paper in library cites

Augustus Odena, Charles Sutton - 2020

1 paper in library cites

A. T. Nguyen, T. T. Nguyen, T. N. Nguyen - 2013

1 paper in library cites

S. Iyer, I. Konstas, A. Cheung, Luke Zettlemoyer - 2018

1 paper in library cites

M. Allamanis, Charles Sutton - 2013

1 paper in library cites

M. Zavershynskyi, A. Skidanov, Illia Polosukhin - 2018

1 paper in library cites

K. Kurach, M. Andrychowicz, Ilya Sutskever - 2016

1 paper in library cites

V. Murali, L. Qi, S. Chaudhuri, C. Jermaine - 2018

1 paper in library cites

A. Pnueli, R. Rosner - 1989

1 paper in library cites

I. V. Pandi, E. T. Barr, A. D. Gordon, Charles Sutton - 2020

1 paper in library cites

S. Karaivanov, V. Raychev, M. Vechev - 2014

1 paper in library cites

V. Raychev, M. Vechev, A. Krause - 2015

1 paper in library cites

Xiang Lisa Li, Percy Liang - 2021

1 paper in library cites

V. Raychev, P. Bielik, M. Vechev - 2016

1 paper in library cites

Raul Puri, D. S. Kung, G. Janssen, Wenxuan Zhang, G. Domeniconi, V. Zolotov, J. Dolby, Jixuan Chen, M. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, U. Finkler - 2021

1 paper in library cites

R. J. Waldinger, R. C. T. Lee, S. International - 1969

1 paper in library cites

Ziru Chen, S. J. Kommrusch, M. Tufano, L. N. Pouchet, D. Poshyvanyk, M. Monperrus - 2019

1 paper in library cites

A. Mastropaolo, S. Scalabrino, N. Cooper, D. N. Palacio, D. Poshyvanyk, R. Oliveto, G. Bavota - 2021

1 paper in library cites

R. Alur, R. Bodik, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. S. Lezama, E. Torlak, A. Udupa - 2013

1 paper in library cites

J. W. Backus, R. J. Beeber, S. Best, R. Goldberg, L. M. Haibt, H. L. Herrick, R. A. Nelson, D. Sayre, P. B. Sheridan, H. S. Stern, I. Ziller, R. A. Hughes, R. Nutt - 1957

1 paper in library cites

J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, M. M. Mia - 2014

1 paper in library cites

M. Pradel, G. Gousios, Joseph Liu, S. Chandra - 2020

1 paper in library cites

A. Louis, S. K. Dash, E. T. Barr, M. D. Ernst, Charles Sutton - 2020

1 paper in library cites

K. Ellis, Maxwell Nye, Y. Pu, F. Sosa, J. Tenenbaum, A. S. Lezama - 2019

1 paper in library cites

Cited by

4

papers in your library

Cites

21

papers in your library

Read

on June 2, 2026

Your review

Tags

canonBenchmark

Paper Aliases

No aliases