2025

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Bowen Baker, J. Huizinga, Leo Gao, Z. Dou, M. Y. Guan, A. Madry, Wojciech Zaremba, J. Pachocki, D. Farhi

citations

Cite Score

13

Citation Graph

Loading graph...

References [0]

Sort:
Filter:

No references match the current filters.

Cited by

2

papers in your library

Cites

0

papers in your library

Notes

Tags

Paper Aliases

No aliases