:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Jiaxin, Xiong, Caiming, Wu, Chien-Sheng
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2601.15778
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Agentic Uncertainty Quantification
by: Zhang, Jiaxin, et al.
Published: (2026)

Benchmarking Deep Search over Heterogeneous Enterprise Data
by: Choubey, Prafulla Kumar, et al.
Published: (2025)

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
by: Xu, Austin, et al.
Published: (2025)

SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
by: Zhang, Nan, et al.
Published: (2024)

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
by: Zhang, Jiaxin, et al.
Published: (2026)

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
by: Huang, Kung-Hsiang, et al.
Published: (2025)

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
by: Huang, Kung-Hsiang, et al.
Published: (2024)

Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms
by: Pandit, Shrey, et al.
Published: (2025)

CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions
by: Huang, Kung-Hsiang, et al.
Published: (2025)

Don't Think Twice! Over-Reasoning Impairs Confidence Calibration
by: Lacombe, Romain, et al.
Published: (2025)

Calibrating Verbalized Confidence with Self-Generated Distractors
by: Wang, Victor, et al.
Published: (2025)

Fact-Level Confidence Calibration and Self-Correction
by: Yuan, Yige, et al.
Published: (2024)

Double-Calibration: Towards Reliable LLMs via Calibrating Knowledge and Reasoning Confidence
by: Lu, Yuyin, et al.
Published: (2026)

Parameter-Efficient Detoxification with Contrastive Decoding
by: Niu, Tong, et al.
Published: (2024)

A Survey of Confidence Estimation and Calibration in Large Language Models
by: Geng, Jiahui, et al.
Published: (2023)

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
by: Zhang, Chuang, et al.
Published: (2026)

ConfTuner: Training Large Language Models to Express Their Confidence Verbally
by: Li, Yibo, et al.
Published: (2025)

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals
by: Torrielli, Federico, et al.
Published: (2026)

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
by: Stengel-Eskin, Elias, et al.
Published: (2024)

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
by: Ke, Zixuan, et al.
Published: (2025)

Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
by: Pang, Bo, et al.
Published: (2025)

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
by: Wang, Ziyang, et al.
Published: (2025)

The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration
by: Ghosh, Sudipta, et al.
Published: (2026)

Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2025)

Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models
by: Chhikara, Prateek
Published: (2025)

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
by: Xiao, Wenyi, et al.
Published: (2026)

J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
by: Xu, Austin, et al.
Published: (2025)

Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory
by: Li, Weixian Waylon, et al.
Published: (2026)

JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking
by: Niu, Tong, et al.
Published: (2024)

HPE:Answering Complex Questions over Text by Hybrid Question Parsing and Execution
by: Liu, Ye, et al.
Published: (2023)

Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration
by: Yuan, Yi, et al.
Published: (2026)

Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
by: Bani-Harouni, David, et al.
Published: (2025)

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
by: Subramani, Nishant, et al.
Published: (2025)

Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA
by: Testoni, Alberto, et al.
Published: (2026)

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
by: Chakrabarty, Tuhin, et al.
Published: (2025)

Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation
by: Rana, Ashish, et al.
Published: (2026)

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
by: Zhao, Zirui, et al.
Published: (2024)

Reward Models Identify Consistency, Not Causality
by: Xu, Yuhui, et al.
Published: (2025)

Graph-based Confidence Calibration for Large Language Models
by: Li, Yukun, et al.
Published: (2024)

How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding
by: Chen, Wei, et al.
Published: (2026)