2021
Cite Score
69
AI summary
This paper explores the limits of large language models for program synthesis, introducing two new benchmarks, MBPP and MathQA-Python, and showing that synthesis performance scales log-linearly with model size, achieving up to 83.8% accuracy on MathQA-Python with fine-tuning.
Main Contributions
Abstract
This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize short Python programs from natural language descriptions. The Mostly Basic Programming Problems (MBPP) dataset contains 974 programming tasks, designed to be solvable by entry-level programmers. The MathQA-Python dataset, a Python version of the MathQA benchmark, contains 23914 problems that evaluate the ability of the models to synthesize code from more complex text. On both datasets, we find that synthesis performance scales log-linearly with model size. Our largest models, even without finetuning on a code dataset, can synthesize solutions to 59.6% of the problems from MBPP using few-shot learning with a well-designed prompt. Fine-tuning on a held-out portion of the dataset improves performance by about 10 percentage points across most model sizes. On the MathQA-Python dataset, the largest fine-tuned model achieves 83.8% accuracy. Going further, we study the model’s ability to engage in dialog about code, incorporating human feedback to improve its solutions. We find that natural language feedback from a human halves the error rate compared to the model’s initial prediction. Additionally, we conduct an error analysis to shed light on where these models fall short and what types of programs are most difficult to generate. Finally, we explore the semantic grounding of these models by fine-tuning them to predict the results of program execution. We find that even our best models are generally unable to predict the output of a program given a specific input.
Citation Graph
References [102]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018
39 papers in library cite
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019
27 papers in library cite
M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018
27 papers in library cite
Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018
23 papers in library cite
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021
9 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
J. Howard, Sebastian Ruder - 2018
14 papers in library cite
John Richardson - 2018
3 papers in library cite
Alex Graves, G. Wayne, Ivo Danihelka - 2014
18 papers in library cite
Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011
13 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
Lukasz Kaiser, Ilya Sutskever - 2016
5 papers in library cite
Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019
17 papers in library cite
E. M. Bender, T. Gebru, Angelina McMillan-Major, S. Shmitchell - 2021
5 papers in library cite
Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, E. Guo, Collin Burns, S. Puranik, He He, D. X. Song, Jacob Steinhardt - 2021
4 papers in library cite
A. Amini, S. Gabriel, P. Lin, R. K. Kedziorski, Yejin Choi, Hananneh Hajishirzi - 2019
6 papers in library cite
Wojciech Zaremba, Ilya Sutskever - 2014
8 papers in library cite
W. Ling, Edward Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, Feng Wang, Phil Blunsom - 2016
3 papers in library cite
R. Schuster, C. Song, E. Tromer, V. Shmatikov - 2020
2 papers in library cite
Alex Graves, G. Wayne, M. Reynolds, T. Harley, Ivo Danihelka, A. G. Barwinska, S. G. Colmenarejo, Edward Grefenstette, T. Ramalho, J. Agapiou, A. P. Badia, K. M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, Phil Blunsom, Koray Kavukcuoglu, Demis Hassabis - 2016
5 papers in library cite
Nicholas Carlini, F. Tramer, E. Wallace, M. Jagielski, Ariel Herbert-Voss, K. Lee, A. Roberts, Tom B. Brown, Dawn Song, U. Erlingsson - 2021
4 papers in library cite
S. Gulwani - 2011
3 papers in library cite
E. M. Bender, A. Koller - 2020
3 papers in library cite
M. Balog, A. L. Gaunt, M. Brockschmidt, S. Nowozin, D. Tarlow - 2017
3 papers in library cite
M. Allamanis, E. T. Barr, P. Devanbu, Charles Sutton - 2018
2 papers in library cite
P. Yin, Graham Neubig - 2017
2 papers in library cite
Z. Feng, Daniel Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, M. Zhou - 2020
2 papers in library cite
H. Husain, H. H. Wu, T. Gazit, M. Allamanis, M. Brockschmidt - 2019
2 papers in library cite
S. Lu, Daniel Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, Dawn Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, Shuming Liu - 2021
2 papers in library cite
X. Chen, C. L. Liu, Dawn Song - 2019
2 papers in library cite
S. Black, Leo Gao, Peng Wang, C. Leahy, Stella Biderman - 2021
2 papers in library cite
D. E. Shaw, W. R. Swartout, C. C. Green - 1975
2 papers in library cite
A. Hindle, E. Barr, Z. Su, P. Devanbu, M. Gable - 2012
2 papers in library cite
S. Gulwani, O. Polozov, R. Singh - 2017
2 papers in library cite
Tal Schuster, A. Kalyan, O. Polozov, A. T. Kalai - 2021
2 papers in library cite
C. B. Clement, Dawn Drain, J. Timcheck, A. Svyatkovskiy, N. Sundaresan - 2020
2 papers in library cite
Jacob Devlin, Jonathan Uesato, S. Bhupatiraju, R. Singh, Abdel Rahman Mohamed, P. Kohli - 2017
2 papers in library cite
S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, Percy Liang - 2019
2 papers in library cite
C. J. Maddison, D. Tarlow - 2014
2 papers in library cite
B. Lester, R. A. Rfou, Noah Constant - 2021
2 papers in library cite
Z. Manna, R. J. Waldinger - 1971
2 papers in library cite
B. Roziere, M. A. Lachaux, L. Chanussot, G. Lample - 2020
2 papers in library cite
M. Allamanis, H. Peng, Charles Sutton - 2016
1 paper in library cites
F. Alet, J. L. Contreras, James Koppel, Maxwell Nye, A. S. Lezama, T. L. Perez, L. Kaelbling, J. Tenenbaum - 2021
1 paper in library cites
P. D. Summers - 1977
1 paper in library cites
M. Allamanis - 2021
1 paper in library cites
B. J. Copeland - 2012
1 paper in library cites
F. Long, M. Rinard - 2016
1 paper in library cites
A. Zohar, Lior Wolf - 2018
1 paper in library cites
R. M. Karampatsis, H. Babii, R. Robbes, Charles Sutton, A. Janes - 2020
1 paper in library cites
Augustus Odena, K. Shi, D. Bieber, R. Singh, Charles Sutton - 2020
1 paper in library cites
V. Raychev, M. Vechev, E. Yahav - 2014
1 paper in library cites
S. Kim, J. Zhao, Yuandong Tian, S. Chandra - 2020
1 paper in library cites
U. Alon, M. Zilberstein, Omer Levy, E. Yahav - 2019
1 paper in library cites
A. S. Lezama, L. Tancau, R. Bodik, S. A. Seshia, V. A. Saraswat - 2006
1 paper in library cites
S. Panthaplackel, J. J. Li, M. Gligoric, R. J. Mooney - 2021
1 paper in library cites
V. J. Hellendoorn, C. Bird, E. T. Barr, M. Allamanis - 2018
1 paper in library cites
M. Pradel, Koushik Sen - 2018
1 paper in library cites
K. Ellis, C. Wong, M. I. Nye, M. S. Meyer, L. Cary, L. Morales, L. B. Hewitt, A. S. Lezama, Joshua B. Tenenbaum - 2020
1 paper in library cites
Ellen Jiang, E. Toh, A. Molina, A. Donsbach, Carrie Cai, Michael Terry - 2021
1 paper in library cites
C. L. Goues, T. N. Nguyen, S. Forrest, W. Weimer - 2012
1 paper in library cites
V. J. Hellendoorn, Charles Sutton, R. Singh, P. Maniatis, D. Bieber - 2019
1 paper in library cites
Michihiro Yasunaga, Percy Liang - 2020
1 paper in library cites
E. Torlak, R. Bodik - 2013
1 paper in library cites
E. Dinella, H. Dai, Zhiyuan Li, M. Naik, L. Song, K. Wang - 2019
1 paper in library cites
B. Z. Li, Maxwell Nye, Jacob Andreas - 2021
1 paper in library cites
A. Svyatkovskiy, S. K. Deng, S. Fu, N. Sundaresan - 2020
1 paper in library cites
A. S. Lezama - 2018
1 paper in library cites
Z. Manna, R. Waldinger - 1975
1 paper in library cites
Jason Wei, M. Goyal, G. Durrett, I. Dillig - 2020
1 paper in library cites
A. Kanade, P. Maniatis, G. Balakrishnan, K. Shi - 2020
1 paper in library cites
K. Ellis, L. Morales, M. S. Meyer, A. S. Lezama, J. Tenenbaum - 2018
1 paper in library cites
M. Allamanis, E. T. Barr, C. Bird, Charles Sutton - 2014
1 paper in library cites
D. Bieber, Charles Sutton, Hugo Larochelle, D. Tarlow - 2020
1 paper in library cites
D. Tarlow, S. Moitra, A. Rice, Ziru Chen, Pierre Antoine Manzagol, Charles Sutton, E. Aftandilian - 2019
1 paper in library cites
M. Allamanis, M. Brockschmidt, M. Khademi - 2018
1 paper in library cites
Augustus Odena, Charles Sutton - 2020
1 paper in library cites
A. T. Nguyen, T. T. Nguyen, T. N. Nguyen - 2013
1 paper in library cites
S. Iyer, I. Konstas, A. Cheung, Luke Zettlemoyer - 2018
1 paper in library cites
M. Allamanis, Charles Sutton - 2013
1 paper in library cites
M. Zavershynskyi, A. Skidanov, Illia Polosukhin - 2018
1 paper in library cites
K. Kurach, M. Andrychowicz, Ilya Sutskever - 2016
1 paper in library cites
V. Murali, L. Qi, S. Chaudhuri, C. Jermaine - 2018
1 paper in library cites
A. Pnueli, R. Rosner - 1989
1 paper in library cites
I. V. Pandi, E. T. Barr, A. D. Gordon, Charles Sutton - 2020
1 paper in library cites
S. Karaivanov, V. Raychev, M. Vechev - 2014
1 paper in library cites
V. Raychev, M. Vechev, A. Krause - 2015
1 paper in library cites
Xiang Lisa Li, Percy Liang - 2021
1 paper in library cites
V. Raychev, P. Bielik, M. Vechev - 2016
1 paper in library cites
Raul Puri, D. S. Kung, G. Janssen, Wenxuan Zhang, G. Domeniconi, V. Zolotov, J. Dolby, Jixuan Chen, M. Choudhury, L. Decker, V. Thost, L. Buratti, S. Pujar, U. Finkler - 2021
1 paper in library cites
R. J. Waldinger, R. C. T. Lee, S. International - 1969
1 paper in library cites
Ziru Chen, S. J. Kommrusch, M. Tufano, L. N. Pouchet, D. Poshyvanyk, M. Monperrus - 2019
1 paper in library cites
A. Mastropaolo, S. Scalabrino, N. Cooper, D. N. Palacio, D. Poshyvanyk, R. Oliveto, G. Bavota - 2021
1 paper in library cites
R. Alur, R. Bodik, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. S. Lezama, E. Torlak, A. Udupa - 2013
1 paper in library cites
J. W. Backus, R. J. Beeber, S. Best, R. Goldberg, L. M. Haibt, H. L. Herrick, R. A. Nelson, D. Sayre, P. B. Sheridan, H. S. Stern, I. Ziller, R. A. Hughes, R. Nutt - 1957
1 paper in library cites
J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, M. M. Mia - 2014
1 paper in library cites
M. Pradel, G. Gousios, Joseph Liu, S. Chandra - 2020
1 paper in library cites
A. Louis, S. K. Dash, E. T. Barr, M. D. Ernst, Charles Sutton - 2020
1 paper in library cites
K. Ellis, Maxwell Nye, Y. Pu, F. Sosa, J. Tenenbaum, A. S. Lezama - 2019
1 paper in library cites
Cited by
4
papers in your library
Cites
21
papers in your library
Read
on June 2, 2026
Your review
Tags
Paper Aliases
No aliases