Saved in:
| Main Authors: | Zhang, Jiaxin, Xiong, Caiming, Wu, Chien-Sheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.15778 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Agentic Uncertainty Quantification
by: Zhang, Jiaxin, et al.
Published: (2026)
by: Zhang, Jiaxin, et al.
Published: (2026)
Benchmarking Deep Search over Heterogeneous Enterprise Data
by: Choubey, Prafulla Kumar, et al.
Published: (2025)
by: Choubey, Prafulla Kumar, et al.
Published: (2025)
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
by: Zhang, Nan, et al.
Published: (2024)
by: Zhang, Nan, et al.
Published: (2024)
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
by: Zhang, Jiaxin, et al.
Published: (2026)
by: Zhang, Jiaxin, et al.
Published: (2026)
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
by: Huang, Kung-Hsiang, et al.
Published: (2024)
by: Huang, Kung-Hsiang, et al.
Published: (2024)
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions
by: Huang, Kung-Hsiang, et al.
Published: (2025)
by: Huang, Kung-Hsiang, et al.
Published: (2025)
Don't Think Twice! Over-Reasoning Impairs Confidence Calibration
by: Lacombe, Romain, et al.
Published: (2025)
by: Lacombe, Romain, et al.
Published: (2025)
Calibrating Verbalized Confidence with Self-Generated Distractors
by: Wang, Victor, et al.
Published: (2025)
by: Wang, Victor, et al.
Published: (2025)
Fact-Level Confidence Calibration and Self-Correction
by: Yuan, Yige, et al.
Published: (2024)
by: Yuan, Yige, et al.
Published: (2024)
Double-Calibration: Towards Reliable LLMs via Calibrating Knowledge and Reasoning Confidence
by: Lu, Yuyin, et al.
Published: (2026)
by: Lu, Yuyin, et al.
Published: (2026)
Parameter-Efficient Detoxification with Contrastive Decoding
by: Niu, Tong, et al.
Published: (2024)
by: Niu, Tong, et al.
Published: (2024)
A Survey of Confidence Estimation and Calibration in Large Language Models
by: Geng, Jiahui, et al.
Published: (2023)
by: Geng, Jiahui, et al.
Published: (2023)
Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
by: Zhang, Chuang, et al.
Published: (2026)
by: Zhang, Chuang, et al.
Published: (2026)
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
by: Li, Yibo, et al.
Published: (2025)
by: Li, Yibo, et al.
Published: (2025)
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals
by: Torrielli, Federico, et al.
Published: (2026)
by: Torrielli, Federico, et al.
Published: (2026)
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
by: Stengel-Eskin, Elias, et al.
Published: (2024)
by: Stengel-Eskin, Elias, et al.
Published: (2024)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
by: Ke, Zixuan, et al.
Published: (2025)
by: Ke, Zixuan, et al.
Published: (2025)
Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math
by: Pang, Bo, et al.
Published: (2025)
by: Pang, Bo, et al.
Published: (2025)
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding
by: Wang, Ziyang, et al.
Published: (2025)
by: Wang, Ziyang, et al.
Published: (2025)
The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration
by: Ghosh, Sudipta, et al.
Published: (2026)
by: Ghosh, Sudipta, et al.
Published: (2026)
Improving the Calibration of Confidence Scores in Text Generation Using the Output Distribution's Characteristics
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2025)
by: Flores, Lorenzo Jaime Yu, et al.
Published: (2025)
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models
by: Chhikara, Prateek
Published: (2025)
by: Chhikara, Prateek
Published: (2025)
VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
by: Xiao, Wenyi, et al.
Published: (2026)
by: Xiao, Wenyi, et al.
Published: (2026)
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
by: Xu, Austin, et al.
Published: (2025)
by: Xu, Austin, et al.
Published: (2025)
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory
by: Li, Weixian Waylon, et al.
Published: (2026)
by: Li, Weixian Waylon, et al.
Published: (2026)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking
by: Niu, Tong, et al.
Published: (2024)
by: Niu, Tong, et al.
Published: (2024)
HPE:Answering Complex Questions over Text by Hybrid Question Parsing and Execution
by: Liu, Ye, et al.
Published: (2023)
by: Liu, Ye, et al.
Published: (2023)
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration
by: Yuan, Yi, et al.
Published: (2026)
by: Yuan, Yi, et al.
Published: (2026)
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
by: Bani-Harouni, David, et al.
Published: (2025)
by: Bani-Harouni, David, et al.
Published: (2025)
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
by: Subramani, Nishant, et al.
Published: (2025)
by: Subramani, Nishant, et al.
Published: (2025)
Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA
by: Testoni, Alberto, et al.
Published: (2026)
by: Testoni, Alberto, et al.
Published: (2026)
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
by: Chakrabarty, Tuhin, et al.
Published: (2025)
by: Chakrabarty, Tuhin, et al.
Published: (2025)
Oblivion: Self-Adaptive Agentic Memory Control through Decay-Driven Activation
by: Rana, Ashish, et al.
Published: (2026)
by: Rana, Ashish, et al.
Published: (2026)
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
by: Zhao, Zirui, et al.
Published: (2024)
by: Zhao, Zirui, et al.
Published: (2024)
Reward Models Identify Consistency, Not Causality
by: Xu, Yuhui, et al.
Published: (2025)
by: Xu, Yuhui, et al.
Published: (2025)
Graph-based Confidence Calibration for Large Language Models
by: Li, Yukun, et al.
Published: (2024)
by: Li, Yukun, et al.
Published: (2024)
How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding
by: Chen, Wei, et al.
Published: (2026)
by: Chen, Wei, et al.
Published: (2026)
Similar Items
-
Agentic Uncertainty Quantification
by: Zhang, Jiaxin, et al.
Published: (2026) -
Benchmarking Deep Search over Heterogeneous Enterprise Data
by: Choubey, Prafulla Kumar, et al.
Published: (2025) -
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
by: Xu, Austin, et al.
Published: (2025) -
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
by: Zhang, Nan, et al.
Published: (2024) -
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
by: Zhang, Jiaxin, et al.
Published: (2026)