2024
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
C. Laidlaw, S. Singhal, A. D. Dragan
Citation Graph
References [0]
No references match the current filters.
Cited by
1
papers in your library
Cites
0
Add to reading list
Notes
Tags
Paper Aliases
No aliases