Saved in:
| Main Authors: | Wu, Sean, Gustafsson, Fredrik K., Phillips, Edward, Gao, Boyan, Thakur, Anshul, Clifton, David A. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03216 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Semantic Self-Distillation for Language Model Uncertainty
by: Phillips, Edward, et al.
Published: (2026)
by: Phillips, Edward, et al.
Published: (2026)
Entropy Alone is Insufficient for Safe Selective Prediction in LLMs
by: Phillips, Edward, et al.
Published: (2026)
by: Phillips, Edward, et al.
Published: (2026)
Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs
by: Phillips, Edward, et al.
Published: (2025)
by: Phillips, Edward, et al.
Published: (2025)
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
by: Xing, Xingrun, et al.
Published: (2024)
by: Xing, Xingrun, et al.
Published: (2024)
Large Language Models in the Clinic: A Comprehensive Benchmark
by: Liu, Fenglin, et al.
Published: (2024)
by: Liu, Fenglin, et al.
Published: (2024)
Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models
by: Zhang, Glenn, et al.
Published: (2025)
by: Zhang, Glenn, et al.
Published: (2025)
Towards Neural No-Resource Language Translation: A Comparative Evaluation of Approaches
by: Thakur, Madhavendra
Published: (2024)
by: Thakur, Madhavendra
Published: (2024)
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges
by: Xiao, Xiao, et al.
Published: (2025)
by: Xiao, Xiao, et al.
Published: (2025)
Cognition Chain for Explainable Psychological Stress Detection on Social Media
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Are Large Language Models Good Statisticians?
by: Zhu, Yizhang, et al.
Published: (2024)
by: Zhu, Yizhang, et al.
Published: (2024)
Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting
by: Chen, Wei, et al.
Published: (2025)
by: Chen, Wei, et al.
Published: (2025)
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
by: Bani-Harouni, David, et al.
Published: (2025)
by: Bani-Harouni, David, et al.
Published: (2025)
Confidence in the Reasoning of Large Language Models
by: Pawitan, Yudi, et al.
Published: (2024)
by: Pawitan, Yudi, et al.
Published: (2024)
Benchmarking Pathology Foundation Models for Breast Cancer Survival Prediction
by: Gustafsson, Fredrik K., et al.
Published: (2026)
by: Gustafsson, Fredrik K., et al.
Published: (2026)
UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models
by: Qin, Zhanyue, et al.
Published: (2024)
by: Qin, Zhanyue, et al.
Published: (2024)
Calibrating the Confidence of Large Language Models by Eliciting Fidelity
by: Zhang, Mozhi, et al.
Published: (2024)
by: Zhang, Mozhi, et al.
Published: (2024)
Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction
by: Gustafsson, Fredrik K., et al.
Published: (2024)
by: Gustafsson, Fredrik K., et al.
Published: (2024)
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
by: Li, Yibo, et al.
Published: (2025)
by: Li, Yibo, et al.
Published: (2025)
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
by: Zhao, Xinran, et al.
Published: (2024)
by: Zhao, Xinran, et al.
Published: (2024)
BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems
by: Wang, Wei, et al.
Published: (2024)
by: Wang, Wei, et al.
Published: (2024)
GEMMA-SQL: A Novel Text-to-SQL Model Based on Large Language Models
by: Pandey, Hari Mohan, et al.
Published: (2025)
by: Pandey, Hari Mohan, et al.
Published: (2025)
It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models
by: Santini, Cristian, et al.
Published: (2026)
by: Santini, Cristian, et al.
Published: (2026)
Improving Clinical Dataset Condensation with Mode Connectivity-based Trajectory Surrogates
by: Nganjimi, Pafue Christy, et al.
Published: (2025)
by: Nganjimi, Pafue Christy, et al.
Published: (2025)
Optimization-Inspired Few-Shot Adaptation for Large Language Models
by: Gao, Boyan, et al.
Published: (2025)
by: Gao, Boyan, et al.
Published: (2025)
Confidence Estimation for Text-to-SQL in Large Language Models
by: Maleki, Sepideh Entezari, et al.
Published: (2025)
by: Maleki, Sepideh Entezari, et al.
Published: (2025)
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
by: Zhou, Hongjian, et al.
Published: (2023)
by: Zhou, Hongjian, et al.
Published: (2023)
Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts
by: Gustafsson, Fredrik K., et al.
Published: (2024)
by: Gustafsson, Fredrik K., et al.
Published: (2024)
Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?
by: Ni, Shiyu, et al.
Published: (2024)
by: Ni, Shiyu, et al.
Published: (2024)
Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning
by: Mao, Zhenjiang, et al.
Published: (2026)
by: Mao, Zhenjiang, et al.
Published: (2026)
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
by: Pradeep, Ronak, et al.
Published: (2025)
by: Pradeep, Ronak, et al.
Published: (2025)
Self-Training Large Language Models with Confident Reasoning
by: Jang, Hyosoon, et al.
Published: (2025)
by: Jang, Hyosoon, et al.
Published: (2025)
Closing the Confidence-Faithfulness Gap in Large Language Models
by: Miao, Miranda Muqing, et al.
Published: (2026)
by: Miao, Miranda Muqing, et al.
Published: (2026)
When Quantization Affects Confidence of Large Language Models?
by: Proskurina, Irina, et al.
Published: (2024)
by: Proskurina, Irina, et al.
Published: (2024)
STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing
by: Zou, Jiaru, et al.
Published: (2024)
by: Zou, Jiaru, et al.
Published: (2024)
HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models
by: Ajayi, Edward, et al.
Published: (2026)
by: Ajayi, Edward, et al.
Published: (2026)
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models
by: Kumar, Abhishek, et al.
Published: (2024)
by: Kumar, Abhishek, et al.
Published: (2024)
Evaluating Large Language Models for Multimodal Simulated Ophthalmic Decision-Making in Diabetic Retinopathy and Glaucoma Screening
by: Tabuse, Cindy Lie, et al.
Published: (2025)
by: Tabuse, Cindy Lie, et al.
Published: (2025)
Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective
by: Fang, Hao, et al.
Published: (2026)
by: Fang, Hao, et al.
Published: (2026)
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
by: Liu, Xiaoou, et al.
Published: (2025)
by: Liu, Xiaoou, et al.
Published: (2025)
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
by: Li, Xiaomin, et al.
Published: (2025)
by: Li, Xiaomin, et al.
Published: (2025)
Similar Items
-
Semantic Self-Distillation for Language Model Uncertainty
by: Phillips, Edward, et al.
Published: (2026) -
Entropy Alone is Insufficient for Safe Selective Prediction in LLMs
by: Phillips, Edward, et al.
Published: (2026) -
Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs
by: Phillips, Edward, et al.
Published: (2025) -
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
by: Xing, Xingrun, et al.
Published: (2024) -
Large Language Models in the Clinic: A Comprehensive Benchmark
by: Liu, Fenglin, et al.
Published: (2024)