Note: * denotes equal contribution.
🧰 Tool Use & Agent
Under Review
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, Yi R. Fung
Under Review
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
Dadi Guo*, Qingyu Liu*, Yuejin Xie, Jiayu Liu, Zhiyuan Fan, Qihan Ren, Shuai Shao, Tianyi Zhou, Dongrui Liu, Yi R. Fung
Under Review
PACE: A Factor-Guided Coarse-to-Fine Agentic Framework for Long-Video Understanding
Baixuan Xu, Yinyui XU, Tianshi Zheng, Zhaowei Wang, Weiqi Wang, Wei Fan, Haochen Shi, Jiayu Liu, Qing Zong, Xiyu REN, Xinyu Geng, Zhitao He, Yangqiu Song
🧠Advanced Reasoning
ICLR 2026
Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang*, Zhiyuan Fan*, Jiayu Liu*, Jen-tse Huang, Yi R. Fung
MathNLP 2025
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models
Dadi Guo*, Jiayu Liu*, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung
Under Review
VLM-Dixit: Investigating Multi-Modal Abductive Reasoning and Entailment Verification with VLMs in Dixit Gameplay
MO Yunxiang*, Tianshi Zheng*, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, Yangqiu Song
Under Review
Rethinking Reinforcement Learning from Intrinsic Feedback for LLM Reasoning: Data Insensitivity and Limited Generalization
Qingcheng Zeng, Heli Qi, Yutong Yin, Jiarui Liu, Zeqi Zhou, Jiayu Liu, Weihao Xuan, Rob Voigt, Zhaoran Wang, Naoto Yokoya
🧩 Trustworthiness and Reliability
ACL 2025
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Jiayu Liu, Qing Zong, Weiqi Wang, Yangqiu Song
FEVER 2024
GProofT: A Multi-dimension Multi-round Fact Checking Framework Based on Claim Fact Extraction
Jiayu Liu*, Junhao Tang*, Hanwen Wang*, Baixuan Xu, Haochen Shi, Weiqi Wang, Yangqiu Song
Under Review
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Jiayu Liu*, Rui Wang*, Qing Zong, Qingcheng Zeng, Tianshi Zheng, Haochen Shi, Dadi Guo, Baixuan Xu, Chunyang Li, Yangqiu Song
Under Review
Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Rui Wang*, Qihan Lin*, Jiayu Liu*, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song
Under Review
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
Qing Zong, Jiayu Liu, Tianshi Zheng, Chunyang Li, Baixuan Xu, Haochen Shi, Weiqi Wang, Zhaowei Wang, Chunkit Chan, Yangqiu Song