2025
Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
M. Turpin, A. Arditi, M. Li, J. Benton, J. Michael
Citation Graph
References [0]
No references match the current filters.
Cited by
1
papers in your library
Cites
0
Add to reading list
Notes
Tags
Paper Aliases
No aliases