Jiayu LIU 刘家毓

👋 Welcome to my homepage! 🥂
I’m Jiayu LIU 刘家毓, a third-year undergraduate CS student supervised by Prof. Yangqiu Song and Prof. Yiren Fung at HKUST.


💞️ I’m passionate about playing piano, violin, football, and working out in the gym.

🌱 I’m currently interested in Natural Language Processing, especially in:

  • Improving LLM trustworthiness:
    GProofT (FEVER 2024),
    MarConf (ACL 2025 Main),
    MarPT (Under review in ACL Rolling Review),
    CritiCal (Under review in ACL Rolling Review).

  • Evaluating and enhancing LLM reasoning capabilities:
    RFMDataset (MathNLP 2025, under review in ACL Rolling Review),
    Multirole-R1 (Under review in ICLR).

  • Advanced tool-use capabilities in agentic systems:
    CostBench. (Under review in ACL Rolling Review)

🖋️ Google Scholar
📫 Contact: jliufv@connect.ust.hk
😄 Pronouns: He/Him


🔥 News


📖 Publications

🧩 Trustworthiness and Reliability

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty?
Jiayu Liu, Qing Zong, Weiqi Wang, Yangqiu Song
ACL 2025 Main

GProofT: A Multi-dimension Multi-round Fact Checking Framework Based on Claim Fact Extraction
Jiayu Liu*, Junhao Tang*, Hanwen Wang*, Baixuan Xu, Haochen Shi, Weiqi Wang, Yangqiu Song
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)

Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Rui Wang*, Qihan Lin*, Jiayu Liu*, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song
Under review in ACL Rolling Review

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
Qing Zong, Jiayu Liu, Tianshi Zheng, Chunyang Li, Baixuan Xu, Haochen Shi, Weiqi Wang, Zhaowei Wang, Chunkit Chan, Yangqiu Song
Under review in ACL Rolling Review


🧠 Advanced Reasoning

Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang*, Zhiyuan Fan*, Jiayu Liu*, Jen-tse Huang, Yi R. Fung
Under review in ICLR 2026

Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models
Dadi Guo*, Jiayu Liu*, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung
MathNLP 2025, under review in ACL Rolling Review


🧰 Tool Use

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Agents
Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, Yi R. Fung
Under review in ACL Rolling Review


🤝 Collaboration

VLM-Dixit: Investigating Multi-Modal Abductive Reasoning and Entailment Verification with VLMs in Dixit Gameplay
MO Yunxiang*, Tianshi Zheng*, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, Yangqiu Song
The 5th Wordplay: When Language Meets Games @ EMNLP 2025


🧾 Academic Services

  • [2025/5] Reviewer of IJCAI 2025