2025
AI summary
This paper introduces chain of thought (CoT) monitoring as a method for AI safety by monitoring AI systems' reasoning in natural language. It recommends further research and investment in CoT monitoring while also recognizing its limitations and potential fragility.
Main Contributions
Abstract
AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.
Citation Graph
References [42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Deepseek Ai - 2025
2 papers in library cite
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe - 2023
4 papers in library cite
Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins - 2022
4 papers in library cite
Jason Wei, Xinpeng Wang, Dale Schuurmans, Maarten Bosma, Fanyue Xia, E. Chi, Quoc V. Le, Denny Zhou - 2022
10 papers in library cite
T. Kojima, Shixiang Shane Gu, M. Reid, Y. Matsuo, Y. Iwasawa - 2022
6 papers in library cite
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, Bo Zheng, B. Yu, C. Gao, C. Huang, C. Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, Haowei Lin, Jie Tang, Jihan Yang, J. Tu, J. Zhang, Jihan Yang, Jihan Yang, Jingren Zhou, Jingren Zhou, Junyang Lin, K. Dang, K. Bao, K. Yang, Longhui Yu, L. Deng, M. Li, M. Xue, M. Li, Peizhao Zhang, Peng Wang, Qihao Zhu, R. Men, R. Gao, Shuming Liu, S. Luo, Tao Li, T. Tang, W. Yin, Xiang Ren, Xinpeng Wang, X. Zhang, Xiang Ren, Yu Fan, Yu Su, Y. Z. Zhang, Y. Z. Zhang, Y. Wan, Yibo Liu, Zhengtao Wang, Z. Cui, Zhengyou Zhang, Zijian Zhou, Z. Qiu - 2025
5 papers in library cite
Laria Reynolds, Kyle Mcdonell - 2021
1 paper in library cites
M. Turpin, J. Michael, Ethan Perez, Samuel R. Bowman - 2023
2 papers in library cite
S. Casper, X. Davies, C. Shi, T. K. Gilbert, J. Scheurer, J. Rando, R. Freedman, Tomasz Korbak, D. Lindner, P. Freire, T. T. Wang, S. Marks, C. R. Segerie, M. Carroll, A. Peng, P. J. K. Christoffersen, M. Damani, S. Slocum, U. Anwar, A. Siththaranjan, M. Nadeau, E. J. Michaud, J. Pfau, D. Krasheninnikov, X. Chen, L. Langosco, P. Hase, E. Biyik, A. D. Dragan, David Krueger, D. Sadigh, D. H. Menell - 2023
1 paper in library cites
Openai - 2024
1 paper in library cites
S. Hao, S. Sukhbaatar, D. Su, Xiang Lisa Li, Z. Hu, Jason Weston, Yuandong Tian - 2024
2 papers in library cite
Tamera Lanham, Anna Chen, A. Radhakrishnan, B. Steiner, C. Denison, Danny Hernandez, Dustin Li, Esin Durmus, E. Hubinger, Jackson Kernion, K. Lukosiute, K. Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shusheng Yang, Tom Henighan, Timothy Maxwell, Timothy Telleen Lawton, Tristan Hume, Zac Hatfield Dodds, Jared Kaplan, J. Brauner, Samuel R. Bowman, Ethan Perez - 2023
2 papers in library cite
Bowen Baker, J. Huizinga, Leo Gao, Z. Dou, M. Y. Guan, A. Madry, Wojciech Zaremba, J. Pachocki, D. Farhi - 2025
2 papers in library cite
Yanru Chen, J. Benton, A. Radhakrishnan, Jonathan Uesato, C. Denison, John Schulman, A. Somani, P. Hase, M. Wagner, F. Roger, V. Mikulik, Samuel R. Bowman, Jan Leike, Jared Kaplan, Ethan Perez - 2025
2 papers in library cite
I. Arcuschin, J. Janiak, R. Krzyzanowski, S. Rajamanoharan, Neel Nanda, A. Conmy - 2025
2 papers in library cite
S. Emmons, E. Jenner, D. K. Elson, Rif A. Saurous, S. Rajamanoharan, H. Chen, I. Shafkat, R. Shah - 2025
2 papers in library cite
J. Lindsey, W. Gurnee, E. Ameisen, Berlin Chen, A. Pearce, N. L. Turner, C. Citro, D. Abrahams, S. Carter, B. Hosmer, J. Marcus, M. Sklar, A. Templeton, T. Bricken, C. Mcdougall, H. Cunningham, Tom Henighan, A. Jermyn, Andy Jones, A. Persic, Z. Qi, T. B. Thompson, S. Zimmerman, K. Rivoire, Tom Conerly, Christopher Olah, J. Batson - 2025
2 papers in library cite
Anthropic - 2024
1 paper in library cites
Tomasz Korbak, H. Elsahar, German Kruszewski, M. Dymetmant - 2022
2 papers in library cite
J. Geiping, S. Mcleish, N. Jain, J. Kirchenbauer, Shivalika Singh, B. R. Bartoldson, B. Kailkhura, A. Bhatele, T. Goldstein - 2025
2 papers in library cite
M. Rodriguez, R. A. Popa, F. Flynn, L. Liang, A. Dafoe, A. Wang - 2025
1 paper in library cites
R. Greenblatt, Buck Shlegeris, K. Sachan, F. Roger - 2024
1 paper in library cites
R. Greenblatt, C. Denison, B. Wright, F. Roger, M. Macdiarmid, S. Marks, J. Treutlein, T. Belonax, Jixuan Chen, David Duvenaud, A. Khan, J. Michael, S. Mindermann, Ethan Perez, L. Petrini, Jonathan Uesato, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, E. Hubinger - 2024
1 paper in library cites
R. Shah, A. Irpan, A. M. Turner, A. Wang, A. Conmy, D. Lindner, J. B. Cohen, L. Ho, Neel Nanda, R. A. Popa, R. Jain, R. Greig, S. Albanie, S. Emmons, S. Farquhar, S. Krier, S. Rajamanoharan, S. Bridgers, T. Ijitoe, Tom Everitt, V. Krakovna, V. Varma, V. Mikulik, Z. Kenton, D. Orr, Shane Legg, N. Goodman, A. Dafoe, F. Flynn, A. D. Dragan - 2025
1 paper in library cites
S. Marks, J. Treutlein, T. Bricken, J. Lindsey, J. Marcus, S. M. Sharma, D. Ziegler, E. Ameisen, J. Batson, T. Belonax, Samuel R. Bowman, S. Carter, Berlin Chen, H. Cunningham, C. Denison, F. Dietz, S. Golechha, A. Khan, J. Kirchner, Jan Leike, A. Meek, K. N. Gasparian, E. Ong, Christopher Olah, A. Pearce, F. Roger, J. Salle, A. Shih, Meg Tong, D. Thomas, K. Rivoire, A. Jermyn, M. Macdiarmid, Tom Henighan, E. Hubinger - 2025
1 paper in library cites
Zhiyuan Li, Haozhe Liu, Denny Zhou, T. Ma - 2024
1 paper in library cites
N. G. Dill, M. Balesni, J. Scheurer, M. Hobbhahn - 2025
1 paper in library cites
Mrinank Sharma, Meg Tong, J. Mu, Jason Wei, J. Kruthoff, S. Goodfriend, E. Ong, A. Peng, R. Agarwal, C. Anil, Amanda Askell, N. Bailey, J. Benton, E. Bluemke, Samuel R. Bowman, E. Christiansen, H. Cunningham, A. Dau, A. Gopal, R. Gilson, L. Graham, L. Howard, N. Kalra, Teddy Lee, K. Lin, P. Lofgren, F. Mosconi, C. O'hara, Catherine Olsson, L. Petrini, S. Rajani, N. Saxena, A. Silverstein, T. Singh, T. Sumers, L. Tang, K. K. Troy, C. Weisser, R. Zhong, G. Zhou, Jan Leike, Jared Kaplan, Ethan Perez - 2025
1 paper in library cites
B. Arnav, P. B. Perez, N. H. Burger, T. Kostolansky, H. Whittingham, M. Phuong - 2025
1 paper in library cites
A. Bhatt, C. Rushing, A. Kaufman, T. Tracy, V. Georgiev, D. Matolcsi, A. Khan, Buck Shlegeris - 2025
1 paper in library cites
F. Roger - 2025
1 paper in library cites
M. Phuong, R. S. Zimmermann, Zhengtao Wang, D. Lindner, V. Krakovna, S. Cogan, A. Dafoe, L. Ho, R. Shah - 2025
1 paper in library cites
A. Meinke, B. Schoen, J. Scheurer, M. Balesni, R. Shah, M. Hobbhahn - 2024
1 paper in library cites
Metr - 2024
1 paper in library cites
K. Meng, V. Huang, Jacob Steinhardt, S. Schwettmann - 2025
1 paper in library cites
S. Chennabasappa, C. Nikolaidis, Dawn Song, D. Molnar, S. Ding, S. Wan, S. Whitman, L. Deason, N. Doucette, A. Montilla, A. Gampa, B. D. Paola, D. Gabi, J. Crnkovich, J. C. Testud, K. He, R. Chaturvedi, W. Zhou, J. Saxe - 2025
1 paper in library cites
A. Lazaridou, A. Potapenko, O. Tieleman - 2020
1 paper in library cites
L. Sharkey, B. Chughtai, J. Batson, J. Lindsey, Jeffrey Wu, L. Bushnaq, N. G. Dill, S. Heimersheim, A. Ortega, J. Bloom, Stella Biderman, Adria Garriga Alonso, A. Conmy, Neel Nanda, J. Rumbelow, M. Wattenberg, N. Schoots, John Miller, E. J. Michaud, S. Casper, M. Tegmark, William Saunders, D. Bau, E. Todd, A. Geiger, Mor Geva, J. Hoogland, D. Murfet, T. Mcgrath - 2025
1 paper in library cites
S. Black, A. C. Stickland, J. Pencharz, O. Sourbut, M. Schmatz, J. Bailey, O. Matthews, B. Millwood, A. Remedios, A. Cooney - 2025
1 paper in library cites
J. Benton, M. Wagner, E. Christiansen, C. Anil, Ethan Perez, J. Srivastav, Esin Durmus, Deep Ganguli, Shauna Kravec, Buck Shlegeris, Jared Kaplan, H. Karnofsky, E. Hubinger, R. Grosse, Samuel R. Bowman, David Duvenaud - 2024
1 paper in library cites
U. A. S. Institute, U. A. S. Institute - 2024
1 paper in library cites
Cited by
0
papers in your library
Cites
19
papers in your library
Read
on July 21, 2025
Your review
Tags
Paper Aliases
No aliases