Papperoni

2025

Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning

M. Turpin, A. Arditi, M. Li, J. Benton, J. Michael

citations

Citation Graph

Loading graph...

References [0]

Sort:

Filter:

No references match the current filters.

Cited by

papers in your library

Cites

papers in your library

Notes