Publications

You can also find my articles on my Google Scholar profile.
Note: * denotes equal contribution.

🧰 Tool Use & Agent

ACL 2026
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, Yi R. Fung
Under Review
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
Jiayu Liu, Cheng Qian, Zhenhailong Wang, Bingxuan Li, Jiateng Liu, Heng Wang, Jeonghwan Kim, Yumeng Wang, Xiusi Chen, Yi R. Fung, Heng Ji
Under Review
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
Dadi Guo*, Qingyu Liu*, Yuejin Xie, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Dongrui Liu, Yi R. Fung
Under Review
PACE: A Factor-Guided Coarse-to-Fine Agentic Framework for Long-Video Understanding
Baixuan Xu, Yinyui XU, Tianshi Zheng, Zhaowei Wang, Weiqi Wang, Wei Fan, Haochen Shi, Jiayu Liu, Qing Zong, Xiyu REN, Xinyu Geng, Zhitao He, Yangqiu Song

🧠 Advanced Reasoning

ICLR 2026
Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang*, Zhiyuan Fan*, Jiayu Liu*, Jen-tse Huang, Yi R. Fung
ACL 2026
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models
Dadi Guo*, Jiayu Liu*, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung
ACL 2026
DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
MO Yunxiang*, Tianshi Zheng*, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, Yangqiu Song
Under Review
Rethinking Reinforcement Learning from Intrinsic Feedback for LLM Reasoning: Data Insensitivity and Limited Generalization
Qingcheng Zeng, Heli Qi, Yutong Yin, Jiarui Liu, Zeqi Zhou, Jiayu Liu, Weihao Xuan, Rob Voigt, Zhaoran Wang, Naoto Yokoya

🧩 Trustworthiness and Reliability

ACL 2025
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Jiayu Liu, Qing Zong, Weiqi Wang, Yangqiu Song
FEVER 2024
GProofT: A Multi-dimension Multi-round Fact Checking Framework Based on Claim Fact Extraction
Jiayu Liu*, Junhao Tang*, Hanwen Wang*, Baixuan Xu, Haochen Shi, Weiqi Wang, Yangqiu Song
Under Review
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Jiayu Liu*, Rui Wang*, Qing Zong, Qingcheng Zeng, Tianshi Zheng, Haochen Shi, Dadi Guo, Baixuan Xu, Chunyang Li, Yangqiu Song
Under Review
Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Rui Wang*, Qihan Lin*, Jiayu Liu*, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song
Under Review
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
Qing Zong, Jiayu Liu, Tianshi Zheng, Chunyang Li, Baixuan Xu, Haochen Shi, Weiqi Wang, Zhaowei Wang, Chunkit Chan, Yangqiu Song