:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiaohu, Xie, Xiaohu, Liu, Benjamin, Yao
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2603.06604
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CaRT: Teaching LLM Agents to Know When They Know Enough
by: Liu, Grace, et al.
Published: (2025)

Confidence Estimation for Error Detection in Text-to-SQL Systems
by: Somov, Oleg, et al.
Published: (2025)

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
by: Samragh, Mohammad, et al.
Published: (2025)

Show Your Work with Confidence: Confidence Bands for Tuning Curves
by: Lourie, Nicholas, et al.
Published: (2023)

Do Androids Know They're Only Dreaming of Electric Sheep?
by: CH-Wang, Sky, et al.
Published: (2023)

Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
by: Kumar, Abhijit, et al.
Published: (2026)

Process Supervision of Confidence Margin for Calibrated LLM Reasoning
by: Wang, Liaoyaqi, et al.
Published: (2026)

Divide-or-Conquer? Which Part Should You Distill Your LLM?
by: Wu, Zhuofeng, et al.
Published: (2024)

Know Your Limits: Entropy Estimation Modeling for Compression and Generalization
by: Badger, Benjamin L., et al.
Published: (2025)

Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
by: Bhatt, Neel P., et al.
Published: (2024)

NanoKnow: How to Know What Your Language Model Knows
by: Gu, Lingwei, et al.
Published: (2026)

Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code
by: Song, Zhenghan, et al.
Published: (2026)

Knowing When to Defer: Selective Prediction for Responsible Knowledge Tracing
by: Mitton, Joshua, et al.
Published: (2025)

Confidence Geometry Reveals Trace-Level Correctness in Large Language Model Reasoning
by: Liu, Shuo, et al.
Published: (2026)

Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation
by: Li, Zhuohang, et al.
Published: (2024)

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity
by: Sedova, Anastasiia, et al.
Published: (2024)

Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment
by: Burleigh, Tyler
Published: (2026)

Are LLM Decisions Faithful to Verbal Confidence?
by: Wang, Jiawei, et al.
Published: (2026)

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction
by: van Niekerk, Carel, et al.
Published: (2023)

DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning
by: Chen, Xiwen, et al.
Published: (2025)

ReMoDetect: Reward Models Recognize Aligned LLM's Generations
by: Lee, Hyunseok, et al.
Published: (2024)

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)

What is Wrong with Perplexity for Long-context Language Modeling?
by: Fang, Lizhe, et al.
Published: (2024)

Knowing What You Know Is Not Enough: Large Language Model Confidences Don't Align With Their Actions
by: Pal, Arka, et al.
Published: (2025)

Neural Grammatical Error Correction for Romanian
by: Cotet, Teodor-Mihai, et al.
Published: (2026)

Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment
by: Kumar, Sayantan, et al.
Published: (2026)

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents
by: Wang, Hao, et al.
Published: (2026)

Let the Code LLM Edit Itself When You Edit the Code
by: He, Zhenyu, et al.
Published: (2024)

Large AI Model Empowered Multimodal Semantic Communications
by: Jiang, Feibo, et al.
Published: (2023)

Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
by: Wei, Zhipeng, et al.
Published: (2024)

Cycles of Thought: Measuring LLM Confidence through Stable Explanations
by: Becker, Evan, et al.
Published: (2024)

MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes
by: Abacha, Asma Ben, et al.
Published: (2024)

Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?
by: Gehring, Lukas, et al.
Published: (2025)

Align-then-Unlearn: Embedding Alignment for LLM Unlearning
by: Spohn, Philipp, et al.
Published: (2025)

What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
by: Javaji, Shashidhar Reddy, et al.
Published: (2024)

Forget What You Know about LLMs Evaluations -- LLMs are Like a Chameleon
by: Cohen-Inger, Nurit, et al.
Published: (2025)

MAGE: All-[MASK] Block Already Knows Where to Look in Diffusion LLM
by: Kwon, Omin, et al.
Published: (2026)

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
by: McGovern, Hope, et al.
Published: (2026)

Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
by: Li, Zhe, et al.
Published: (2025)

To Believe or Not to Believe Your LLM
by: Yadkori, Yasin Abbasi, et al.
Published: (2024)