Selected/Recent Publications

Refuse Whenever You Feel Unsafe Improving Safety in LLMs via Decoupled Refusal Training

Youliang Yuan; Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Jiahao Xu, Tian Liang, Pinjia He; Zhaopeng Tu.
arxiv 2024

GPT-4 Is Too Smart To Be Safe Stealthy Chat with LLMs via Cipher

Youliang Yuan; Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Pinjia He; Shuming Shi; Zhaopeng Tu.
ICLR 2024

Does ChatGPT Know that It Does Not Know? Evaluating the Black-Box Calibration of ChatGPT

Youliang Yuan, Wenxuan Wang, Qingshuo Guo, Yiming Xiong, Chihao Shen and Pinjia He
COLING 2024 (Oral)

Projects

Complete list: [Google Scholar]