Jiayu LIU 刘家毓

👋 Welcome to my homepage! 🥂
I’m Jiayu LIU 刘家毓, a junior undergraduate CS student at HKUST. I am currently a exchange student in UIUC and an undergraduate research intern advised by Prof. Heng Ji, Prof. Dilek Hakkani-Tür, and Prof. Gokhan Tur. Previously, I was supervised by Prof. Yangqiu Song and Prof. Yiren Fung at HKUST.

My research goal is to build LLM/agents which is both robust and adaptive.

My current research focuses on:

Adaptiveness and robustness of LLM agents

Building LLM agents that can adapt their plans and remain reliable under uncertainty:

  1. Adaptiveness: Evaluate and analyze LLM agents' adaptive planning abilities ([CostBench], [AdaPlanBench])
  2. Robustness: Analyze the generalization of epistemic markers ([MarConf], [MarPT]) and improve LLMs' noise awareness and robustness in RAG scenarios ([NAACL])

LLM reasoning capabilities

Pinpointing the crucial flaws in LLM reasoning and training diverse-thinking reasoning models:

  1. Evaluation and Analysis: Identify failure reasons (mathematical proof reasoning [RFMBench], RLIF [Rethinking RLIF])
  2. Methods: Self-evolution with verifiable signals (Diversity-Enhanced Reasoning with RL [Multirole-R1], Self-evolution via code [Code2Math])

Here is my google Scholar 📫 Contact: jliufv@connect.ust.hk


🔥 News

[2026/6] 🔬 Honored to join UIUC ConvAI Lab as a undergraduate research intern! Looking forward to learning from Prof. Dilek Hakkani-Tür and Prof. Gokhan Tur!
[2026/4] 🔬 Founded HKUST UGAI Lab, aiming to advance undergraduate AI research at HKUST through targeted mentorship, dedicated funding support, and recognized platforms.
[2026/4] 🎉 Three papers accepted to ACL 2026 (CostBench, RFMBench, DixitWorld)! Huge thanks to all collaborators!
[2026/1] 🎉 My co-first-author paper Diversity-Enhanced Reasoning for Subjective Questions is accepted by ICLR 2026!
[2025/12] 🔬 Honored to join UIUC BLENDER Lab as a undergraduate research intern! Looking forward to learning from Prof. Heng Ji!
[2025/8] 🏅 Honored to receive the UROP Support Grant and UROP Research Travel Sponsorship!
[2025/7] 🔥 Released Diversity-Enhanced Subjective Question-Answering, which got 26 upvotes and ranked #8 in Hugging Face Daily Papers (July 29th)!
[2025/7] ✈️ Will join University of Illinois Urbana-Champaign as an exchange undergraduate student in Spring 2026!
[2025/5] 🎉 My first-author paper Revisiting Epistemic Markers in Confidence Estimation is accepted to ACL 2025 Main! Sincere gratitude to all my collaborators!
[2025/2] 🔬 Honored to join HKUST RenAI Lab as a undergraduate research intern! Looking forward to learning from Prof. Yiren Fung!
[2025/1] 🏅 Honored to receive HKIE Scholarship 2024/25!
[2024/10] 🎉 My co-first-author paper GProofT is accepted by The Seventh FEVER Workshop!
[2024/9] 🏅 Honored to receive The Joseph Lau Luen Hung Charitable Trust Scholarship 2024/25!
[2024/6] ✈️ Traveled to Charles University in Prague for summer exchange! Wonderful experience — loved everything there 🥰
[2024/6] 🔬 Honored to join HKUST KnowComp Group as a undergraduate research intern! Looking forward to learning from Prof. Yangqiu Song!
[2023/9] 🏅 Honored to receive China Soong Ching Ling Foundation Zhiyuan Bursary!

📖 Selected Publications

Note: Only first author/co-first author papers are listed. Please refer to the publications page for full publications. * denotes equal contribution.

Revisiting Epistemic Markers
ACL 2025
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
Jiayu Liu, Qing Zong, Weiqi Wang, Yangqiu Song
CostBench
ACL 2026
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, Yi R. Fung
GProofT
FEVER 2024
GProofT: A Multi-dimension Multi-round Fact Checking Framework Based on Claim Fact Extraction
Jiayu Liu*, Junhao Tang*, Hanwen Wang*, Baixuan Xu, Haochen Shi, Weiqi Wang, Yangqiu Song
Mathematical Proof as a Litmus Test
ACL 2026
Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models
Dadi Guo*, Jiayu Liu*, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung
Diversity-Enhanced Reasoning
ICLR 2026
Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang*, Zhiyuan Fan*, Jiayu Liu*, Jen-tse Huang, Yi R. Fung
AdaPlanBench
Under Review
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
Jiayu Liu, Cheng Qian, Zhenhailong Wang, Bingxuan Li, Jiateng Liu, Heng Wang, Jeonghwan Kim, Yumeng Wang, Xiusi Chen, Yi R. Fung, Heng Ji
NAACL
Under Review
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Jiayu Liu*, Rui Wang*, Qing Zong, Qingcheng Zeng, Tianshi Zheng, Haochen Shi, Dadi Guo, Baixuan Xu, Chunyang Li, Yangqiu Song
Prospect Theory Fails
Under Review
Prospect Theory Fails for LLMs: Revealing Instability of Decision-Making under Epistemic Uncertainty
Rui Wang*, Qihan Lin*, Jiayu Liu*, Qing Zong, Tianshi Zheng, Weiqi Wang, Yangqiu Song

🧾 Academic & community services

  • Founder@HKUST-UGAIL
  • Reviewer@IJCAI 2025, Reviewer@COLM 2026, Reviewer@ARR (Since ARR May 2026)
  • HKUST COMP and CPEG Mentor 2024/25
  • HKUST PMP group mentor
  • IT Secretary of Chinese Folks and Arts Society, HKUST

🎤 Invited Talks

  • “Towards Cost-aware Tool Integrated Planning.” Invited talk at Natural Language Processing and Multimodal Intelligence Platform (Post), Host: Wenjie Li.

Misc

In my spare time, I’m passionate about music and sports. I play the piano and violin, and I also enjoy singing and sharing my performances on social media. For sports, football is my absolute favorite—I’m a member of both the HKUST Mainland Students Football Team and the Guangdong Experimental High School Football Team, and I truly cherish the memories and friendships from those times. I also enjoying sailing in the sea, and yatching makes me feel an incredible sense of freedom.

Feel free to check out some of my music:

  • Me playing Chopin’s Fantaisie-Impromptu: Youtube
  • Me performing the Chinese ballad Why Are the Flowers So Red: Youtube
  • My singing profile: WeSing (全民k歌) (~200 fans)