Saved in:
| Main Author: | Kotte, Varun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.18796 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines
by: Kotte, Varun
Published: (2026)
by: Kotte, Varun
Published: (2026)
PromptPort: A Reliability Layer for Cross-Model Structured Extraction
by: Kotte, Varun
Published: (2026)
by: Kotte, Varun
Published: (2026)
Not All Queries Need Rewriting: When Prompt-Only LLM Refinement Helps and Hurts Dense Retrieval
by: Kotte, Varun
Published: (2026)
by: Kotte, Varun
Published: (2026)
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
by: Ding, Dujian, et al.
Published: (2024)
by: Ding, Dujian, et al.
Published: (2024)
BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
by: Ding, Dujian, et al.
Published: (2025)
by: Ding, Dujian, et al.
Published: (2025)
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge
by: Zhang, Wenbo, et al.
Published: (2026)
by: Zhang, Wenbo, et al.
Published: (2026)
Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting
by: Dai, Hui, et al.
Published: (2026)
by: Dai, Hui, et al.
Published: (2026)
Retrieval Augmented Generation for Domain-specific Question Answering
by: Sharma, Sanat, et al.
Published: (2024)
by: Sharma, Sanat, et al.
Published: (2024)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization
by: Chuang, Yu-Neng, et al.
Published: (2025)
by: Chuang, Yu-Neng, et al.
Published: (2025)
DTRNet: Dynamic Token Routing Network to Reduce Quadratic Costs in Transformers
by: Sharma, Aman, et al.
Published: (2025)
by: Sharma, Aman, et al.
Published: (2025)
Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
by: Karim, Ahmed, et al.
Published: (2025)
by: Karim, Ahmed, et al.
Published: (2025)
Cascade Speculative Drafting for Even Faster LLM Inference
by: Chen, Ziyi, et al.
Published: (2023)
by: Chen, Ziyi, et al.
Published: (2023)
LLM Router: Rethinking Routing with Prefill Activations
by: Varshney, Tanay, et al.
Published: (2026)
by: Varshney, Tanay, et al.
Published: (2026)
Universal Model Routing for Efficient LLM Inference
by: Jitkrittum, Wittawat, et al.
Published: (2025)
by: Jitkrittum, Wittawat, et al.
Published: (2025)
Task-Aware Calibration: Provably Optimal Decoding in LLMs
by: Tomov, Tim, et al.
Published: (2026)
by: Tomov, Tim, et al.
Published: (2026)
RouteLLM: Learning to Route LLMs with Preference Data
by: Ong, Isaac, et al.
Published: (2024)
by: Ong, Isaac, et al.
Published: (2024)
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing
by: Lai, Kunfeng, et al.
Published: (2025)
by: Lai, Kunfeng, et al.
Published: (2025)
Why Semantic Entropy Fails: Geometry-Aware and Calibrated Uncertainty for Policy Optimization
by: Zhang, Zheyuan, et al.
Published: (2026)
by: Zhang, Zheyuan, et al.
Published: (2026)
Benchmarking Uncertainty Calibration in Large Language Model Long-Form Question Answering
by: Müller, Philip, et al.
Published: (2026)
by: Müller, Philip, et al.
Published: (2026)
Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation
by: Ehab, Mohamed, et al.
Published: (2026)
by: Ehab, Mohamed, et al.
Published: (2026)
Process Supervision of Confidence Margin for Calibrated LLM Reasoning
by: Wang, Liaoyaqi, et al.
Published: (2026)
by: Wang, Liaoyaqi, et al.
Published: (2026)
Revisiting Uncertainty Estimation and Calibration of Large Language Models
by: Tao, Linwei, et al.
Published: (2025)
by: Tao, Linwei, et al.
Published: (2025)
Uncertainty in Language Models: Assessment through Rank-Calibration
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
On Subjective Uncertainty Quantification and Calibration in Natural Language Generation
by: Wang, Ziyu, et al.
Published: (2024)
by: Wang, Ziyu, et al.
Published: (2024)
Reward-Based Online LLM Routing via NeuralUCB
by: Tsai, Ming-Hua, et al.
Published: (2026)
by: Tsai, Ming-Hua, et al.
Published: (2026)
RouteNLP: Closed-Loop LLM Routing with Conformal Cascading and Distillation Co-Optimization
by: Guo, Dongxin, et al.
Published: (2026)
by: Guo, Dongxin, et al.
Published: (2026)
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
by: Yue, Murong, et al.
Published: (2023)
by: Yue, Murong, et al.
Published: (2023)
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
by: Shao, Yuantian, et al.
Published: (2025)
by: Shao, Yuantian, et al.
Published: (2025)
TweakLLM: A Routing Architecture for Dynamic Tailoring of Cached Responses
by: Cheema, Muhammad Taha, et al.
Published: (2025)
by: Cheema, Muhammad Taha, et al.
Published: (2025)
Continuous Semantic Caching for Low-Cost LLM Serving
by: Atalar, Baran, et al.
Published: (2026)
by: Atalar, Baran, et al.
Published: (2026)
Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness
by: Deng, Haotian, et al.
Published: (2025)
by: Deng, Haotian, et al.
Published: (2025)
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
by: McCabe, Lucas H., et al.
Published: (2025)
by: McCabe, Lucas H., et al.
Published: (2025)
SAUP: Situation Awareness Uncertainty Propagation on LLM Agent
by: Zhao, Qiwei, et al.
Published: (2024)
by: Zhao, Qiwei, et al.
Published: (2024)
Dr.LLM: Dynamic Layer Routing in LLMs
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
TCUQ: Single-Pass Uncertainty Quantification from Temporal Consistency with Streaming Conformal Calibration for TinyML
by: Lamaakal, Ismail, et al.
Published: (2025)
by: Lamaakal, Ismail, et al.
Published: (2025)
An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration
by: Pavlovic, Maja, et al.
Published: (2026)
by: Pavlovic, Maja, et al.
Published: (2026)
Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades
by: Bouchard, Dylan
Published: (2026)
by: Bouchard, Dylan
Published: (2026)
Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers
by: London, Charles, et al.
Published: (2025)
by: London, Charles, et al.
Published: (2025)
PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments
by: Chen, Zhuang, et al.
Published: (2026)
by: Chen, Zhuang, et al.
Published: (2026)
HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation
by: Deng, Zewei, et al.
Published: (2026)
by: Deng, Zewei, et al.
Published: (2026)
Similar Items
-
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines
by: Kotte, Varun
Published: (2026) -
PromptPort: A Reliability Layer for Cross-Model Structured Extraction
by: Kotte, Varun
Published: (2026) -
Not All Queries Need Rewriting: When Prompt-Only LLM Refinement Helps and Hurts Dense Retrieval
by: Kotte, Varun
Published: (2026) -
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
by: Ding, Dujian, et al.
Published: (2024) -
BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
by: Ding, Dujian, et al.
Published: (2025)