2022
Cite Score
48
AI summary
This paper introduces BIG-Bench Hard (BBH), a suite of 23 challenging language model tasks, and shows that Chain-of-Thought (CoT) prompting significantly improves performance for models like PaLM and Codex, surpassing human-rater baselines on many tasks.
Main Contributions
Abstract
BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language models fall short of average human-rater performance, and are those tasks actually unsolvable by current language models? In this work, we focus on a suite of 23 challenging BIG-Bench tasks which we call BIG-Bench Hard (BBH). These are the task for which prior language model evaluations did not outperform the average human-rater. We find that applying chain-of-thought (CoT) prompting to BBH tasks enables PaLM to surpass the average human-rater performance on 10 of the 23 tasks, and Codex (code-davinci-002) to surpass the average human-rater performance on 17 of the 23 tasks. Since many tasks in BBH require multi-step reasoning, few-shot prompting without CoT, as done in the BIG-Bench evaluations (Srivastava et al., 2022), substantially underestimates the best performance and capabilities of language models, which is better captured via CoT prompting. As further analysis, we explore the interaction between CoT and model scale on BBH, finding that CoT enables emergent task performance on several BBH tasks with otherwise flat scaling curves.
Citation Graph
References [53]
Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018
39 papers in library cite
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei - 2020
21 papers in library cite
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, C. Wainwright, Pamela Mishkin, Chiyuan Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe - 2022
11 papers in library cite
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, Wojciech Zaremba - 2021
9 papers in library cite
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei - 2020
12 papers in library cite
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le - 2021
4 papers in library cite
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aman Gupta, Adria Garriga Alonso - 2022
4 papers in library cite
Colin Raffel, Noam Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, Wentao Li, P. J. Liu - 2019
17 papers in library cite
Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022
10 papers in library cite
Aakanksha Chowdhery, S. Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, A. Roberts, P. Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann - 2023
6 papers in library cite
Jason Wei, Maarten Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Quoc V. Le - 2021
3 papers in library cite
T. Kojima, Shixiang Shane Gu, M. Reid, Y. Matsuo, Y. Iwasawa - 2022
6 papers in library cite
Jason Wei, Yi Tay, R. Bommasani, Colin Raffel, Barret Zoph, S. Borgeaud, D. Yogatama, Maarten Bosma, Denny Zhou, D. Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeffrey Dean, William Fedus - 2022
2 papers in library cite
Xinpeng Wang, Jason Wei, Dale Schuurmans, Quoc Le, E. Chi, Denny Zhou - 2022
5 papers in library cite
V. Sanh, A. Webson, Colin Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja - 2021
4 papers in library cite
2021
4 papers in library cite
Maxwell Nye, A. J. Andreassen, Guy Gur Ari, Henryk Michalewski, Jacob Austin, D. Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, D. Luan, Charles Sutton, Augustus Odena - 2021
5 papers in library cite
Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield Dodds, Nova Dassarma, Eli Tran Johnson, Scott Johnston, Sheer El Showk, Andy Jones, Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai, S. Bowman, Stanislav Fort, Deep Ganguli, Danny Hernandez, J. Jacobson, Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei, Tom B. Brown, Jack Clark, Nicholas Joseph, Benjamin Mann, Sam McCandlish, Christopher Olah, Jared Kaplan - 2022
3 papers in library cite
Danny Hernandez, Jared Kaplan, Tom Henighan, Sam McCandlish - 2021
5 papers in library cite
Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hananneh Hajishirzi - 2021
4 papers in library cite
Zhuoye Zhao, E. Wallace, S. Feng, Dan Klein, Shivalika Singh - 2021
3 papers in library cite
Yiwei Li, Zongyu Lin, S. Zhang, Q. Fu, Berlin Chen, J. G. Lou, Weizhu Chen - 2022
3 papers in library cite
Antonia Creswell, M. Shanahan, Irina Higgins - 2022
3 papers in library cite
Ethan Perez, Douwe Kiela, Kyunghyun Cho - 2021
3 papers in library cite
Timo Schick, Hinrich Schutze - 2020
2 papers in library cite
Genta Indra Winata, Andrea Madotto, Zongyu Lin, Rosanne Liu, Jason Yosinski, Pascale Fung - 2021
2 papers in library cite
F. Shi, Mirac Suzgun, M. Freitag, Xinpeng Wang, S. Srivats, S. Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei - 2023
2 papers in library cite
Denny Zhou, Nathanael Scharli, L. Hou, Jason Wei, Nathan Scales, Xinpeng Wang, Dale Schuurmans, C. Cui, O. Bousquet, Quoc V. Le, Ed H. Chi - 2023
2 papers in library cite
Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova Dassarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Benjamin Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Dario Amodei, Dario Amodei, Tom B. Brown, Jared Kaplan, Sam McCandlish, Christopher Olah, Jack Clark - 2022
2 papers in library cite
B. Lester, R. A. Rfou, Noah Constant - 2021
2 papers in library cite
E. Reif, Daphne Ippolito, A. Yuan, A. Coenen, Chris Callison Burch, Jason Wei - 2022
1 paper in library cites
A. Mittal, Yuandong Tian, Nanyun Peng - 2022
1 paper in library cites
S. M. Xie, A. Raghunathan, Percy Liang, T. Ma - 2021
1 paper in library cites
Zhoujun Cheng, Tianbao Xie, P. Shi, Chun-Liang Li, R. Nadkarni, Y. Hu, Caiming Xiong, D. R. Radev, M. Ostendorf, Luke Zettlemoyer - 2022
1 paper in library cites
M. Abdou, A. Kulmizev, D. Hershcovich, S. Frank, Ellie Pavlick, A. Sogaard - 2021
1 paper in library cites
A. K. Lampinen, I. Dasgupta, S. C. Chan, K. Matthewson, M. H. Tessler, Antonia Creswell, J. L. Mcclelland, J. X. Wang, F. Hill - 2022
1 paper in library cites
Yiwei Li, D. Choi, J. Chung, Nate Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. D. Lago - 2022
1 paper in library cites
A. Drozdov, Nathanael Scharli, E. Akyurek, Nathan Scales, X. Song, X. Chen, O. Bousquet, Denny Zhou - 2022
1 paper in library cites
A. Webson, Ellie Pavlick - 2021
1 paper in library cites
A. Marasovic, I. Beltagy, D. Downey, M. E. Peters - 2022
1 paper in library cites
Weizhu Chen - 2022
1 paper in library cites
R. Patel, Ellie Pavlick - 2022
1 paper in library cites
S. Min, Martha Lewis, Luke Zettlemoyer, Hananneh Hajishirzi - 2022
1 paper in library cites
J. Stacey, Yonatan Belinkov, M. Rei - 2021
1 paper in library cites
Z. Talat, H. Blix, J. Valvoda, M. I. Ganesh, R. Cotterell, A. Williams - 2022
1 paper in library cites
Mirac Suzgun, L. M. Kyriazi, Dan Jurafsky - 2022
1 paper in library cites
S. Wiegreffe, J. Hessel, Swabha Swayamdipta, M. Riedl, Yejin Choi - 2022
1 paper in library cites
S. Min, X. Lyu, Ari Holtzman, M. Artetxe, Martha Lewis, Hananneh Hajishirzi, Luke Zettlemoyer - 2022
1 paper in library cites
Yi Tay, Mostafa Dehghani, S. Abnar, Hyung Won Chung, William Fedus, J. Rao, S. Narang, V. Q. Tran, D. Yogatama, D. Metzler - 2022
1 paper in library cites
Aman Madaan, A. Yazdanbakhsh - 2022
1 paper in library cites
W. Zhou, Jiaxi Hu, Haowei Zhang, X. Liang, Maosong Sun, Caiming Xiong, Jie Tang - 2020
1 paper in library cites
P. Hase, Mohit Bansal - 2022
1 paper in library cites
Cited by
4
papers in your library
Cites
19
papers in your library
Read
on June 3, 2026
Your review
Tags
Paper Aliases
No aliases