Saved in:
| Main Authors: | Brown, Katrina, Muppidi, Aneesh, Shahout, Rana |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01237 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Queueing, Predictions, and LLMs: Challenges and Open Problems
by: Mitzenmacher, Michael, et al.
Published: (2025)
by: Mitzenmacher, Michael, et al.
Published: (2025)
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
by: Muppidi, Aneesh, et al.
Published: (2024)
by: Muppidi, Aneesh, et al.
Published: (2024)
SkipPredict: When to Invest in Predictions for Scheduling
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Orla: A Library for Serving LLM-Based Multi-Agent Systems
by: Shahout, Rana, et al.
Published: (2026)
by: Shahout, Rana, et al.
Published: (2026)
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
by: Hankendi, Can, et al.
Published: (2026)
by: Hankendi, Can, et al.
Published: (2026)
Permutation Invariant Learning with High-Dimensional Particle Filters
by: Boopathy, Akhilan, et al.
Published: (2024)
by: Boopathy, Akhilan, et al.
Published: (2024)
CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models
by: Komanduri, Aneesh, et al.
Published: (2025)
by: Komanduri, Aneesh, et al.
Published: (2025)
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
by: Shahout, Rana, et al.
Published: (2025)
by: Shahout, Rana, et al.
Published: (2025)
GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models
by: Hutson, Dylan, et al.
Published: (2025)
by: Hutson, Dylan, et al.
Published: (2025)
TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference
by: Papaioannou, Konstantinos, et al.
Published: (2026)
by: Papaioannou, Konstantinos, et al.
Published: (2026)
PhysicsEval: Inference-Time Techniques to Improve the Reasoning Proficiency of Large Language Models on Physics Problems
by: Siddique, Oshayer, et al.
Published: (2025)
by: Siddique, Oshayer, et al.
Published: (2025)
A Survey on Efficient Inference for Large Language Models
by: Zhou, Zixuan, et al.
Published: (2024)
by: Zhou, Zixuan, et al.
Published: (2024)
Multimodal Hidden Markov Models for Persistent Emotional State Tracking
by: Ragu, Anamika, et al.
Published: (2026)
by: Ragu, Anamika, et al.
Published: (2026)
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
by: Sui, Yuan, et al.
Published: (2025)
by: Sui, Yuan, et al.
Published: (2025)
Real-Time Progress Prediction in Reasoning Language Models
by: Raaschou-Jensen, Hans Peter Lyngsøe, et al.
Published: (2025)
by: Raaschou-Jensen, Hans Peter Lyngsøe, et al.
Published: (2025)
ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models
by: Zheng, Dongqi
Published: (2025)
by: Zheng, Dongqi
Published: (2025)
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
by: Yang, Wang, et al.
Published: (2025)
by: Yang, Wang, et al.
Published: (2025)
SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models
by: Tian, Jiayi, et al.
Published: (2025)
by: Tian, Jiayi, et al.
Published: (2025)
Efficient Large Language Model Inference with Neural Block Linearization
by: Erdogan, Mete, et al.
Published: (2025)
by: Erdogan, Mete, et al.
Published: (2025)
Plato: Plan to Efficiently Decode for Large Language Model Inference
by: Jin, Shuowei, et al.
Published: (2024)
by: Jin, Shuowei, et al.
Published: (2024)
Dynamic Compressing Prompts for Efficient Inference of Large Language Models
by: Hu, Jinwu, et al.
Published: (2025)
by: Hu, Jinwu, et al.
Published: (2025)
ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling
by: He, Xin, et al.
Published: (2024)
by: He, Xin, et al.
Published: (2024)
MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning
by: Shi, Wenqi, et al.
Published: (2024)
by: Shi, Wenqi, et al.
Published: (2024)
TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load
by: Wang, Xibai
Published: (2025)
by: Wang, Xibai
Published: (2025)
Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling
by: Rana, Annu, et al.
Published: (2025)
by: Rana, Annu, et al.
Published: (2025)
Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning
by: Fu, Zichuan, et al.
Published: (2026)
by: Fu, Zichuan, et al.
Published: (2026)
DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models
by: Zhang, Ruofan, et al.
Published: (2025)
by: Zhang, Ruofan, et al.
Published: (2025)
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization
by: Pang, Bowen, et al.
Published: (2025)
by: Pang, Bowen, et al.
Published: (2025)
Reasoning-Enhanced Large Language Models for Molecular Property Prediction
by: Zhuang, Jiaxi, et al.
Published: (2025)
by: Zhuang, Jiaxi, et al.
Published: (2025)
Combining Constraint Programming Reasoning with Large Language Model Predictions
by: Régin, Florian, et al.
Published: (2024)
by: Régin, Florian, et al.
Published: (2024)
ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
by: He, Zhiyu, et al.
Published: (2025)
by: He, Zhiyu, et al.
Published: (2025)
Optimal Self-Consistency for Efficient Reasoning with Large Language Models
by: Feng, Austin, et al.
Published: (2025)
by: Feng, Austin, et al.
Published: (2025)
Human-Alignment and Calibration of Inference-Time Uncertainty in Large Language Models
by: Moore, Kyle, et al.
Published: (2025)
by: Moore, Kyle, et al.
Published: (2025)
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
by: Yi, Jingyang, et al.
Published: (2025)
by: Yi, Jingyang, et al.
Published: (2025)
Investigating the Potential of Using Large Language Models for Scheduling
by: Jobson, Deddy, et al.
Published: (2024)
by: Jobson, Deddy, et al.
Published: (2024)
Eliciting Reasoning in Language Models with Cognitive Tools
by: Ebouky, Brown, et al.
Published: (2025)
by: Ebouky, Brown, et al.
Published: (2025)
Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
by: Liu, Yan, et al.
Published: (2026)
by: Liu, Yan, et al.
Published: (2026)
A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models
by: Chang, Ching, et al.
Published: (2025)
by: Chang, Ching, et al.
Published: (2025)
Similar Items
-
Queueing, Predictions, and LLMs: Challenges and Open Problems
by: Mitzenmacher, Michael, et al.
Published: (2025) -
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024) -
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
by: Muppidi, Aneesh, et al.
Published: (2024) -
SkipPredict: When to Invest in Predictions for Scheduling
by: Shahout, Rana, et al.
Published: (2024) -
Orla: A Library for Serving LLM-Based Multi-Agent Systems
by: Shahout, Rana, et al.
Published: (2026)