Parameter Efficient Reinforcement Learning From Human Feedback
H. Sidahmed, S. Phatale, A. Hutcheson, Zongyu Lin, Ziru Chen, Z. Yu, J. Jin, S. Chaudhary, R. Komarytsia, C. Ahlheim, Yuxuan Zhu, Boxuan Li, S. Ganesh, B. Byrne, J. Hoffmann, H. Mansoor, Wentao Li, Abhinav Rastogi, L. Dixon