Selected/Recent Publications

Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards

Youliang Yuan, Qiuyang Mang, Jingbang Chen, Hong Wan, Xiaoyuan Liu, Junjielong Xu, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao, Pinjia He
Preprint

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He
Neurips 2025 DB

Refuse Whenever You Feel Unsafe Improving Safety in LLMs via Decoupled Refusal Training

Youliang Yuan; Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Jiahao Xu, Tian Liang, Pinjia He; Zhaopeng Tu.
ACL 2025

GPT-4 Is Too Smart To Be Safe Stealthy Chat with LLMs via Cipher

Youliang Yuan; Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Pinjia He; Shuming Shi; Zhaopeng Tu.
ICLR 2024

Does ChatGPT Know that It Does Not Know? Evaluating the Black-Box Calibration of ChatGPT

Youliang Yuan, Wenxuan Wang, Qingshuo Guo, Yiming Xiong, Chihao Shen and Pinjia He
COLING 2024 (Oral)

Projects

Complete list: [Google Scholar]