2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Mantas Mazeika, L. Phan, X. Yin, Andy Zou, Zhengtao Wang, N. Mu, E. Sakhaee, N. Li, Steven Basart, Boxuan Li, D. A. Forsyth, Dan Hendrycks

citations

Citation Graph

Loading graph...

References [0]

Sort:
Filter:

No references match the current filters.

Cited by

1

papers in your library

Cites

0

papers in your library

Notes

Tags

Paper Aliases

No aliases