2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, L. Phan, X. Yin, Andy Zou, Zhengtao Wang, N. Mu, E. Sakhaee, N. Li, Steven Basart, Boxuan Li, D. A. Forsyth, Dan Hendrycks
Citation Graph
References [0]
No references match the current filters.
Cited by
1
papers in your library
Cites
0
Add to reading list
Notes
Tags
Paper Aliases
No aliases