Saved in:
| Main Author: | Nidhi, Amrit |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05697 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference
by: Sabry, Mohammed, et al.
Published: (2026)
by: Sabry, Mohammed, et al.
Published: (2026)
Conformal Thinking: Risk Control for Reasoning on a Compute Budget
by: Wang, Xi, et al.
Published: (2026)
by: Wang, Xi, et al.
Published: (2026)
One Jump Is All You Need: Short-Cutting Transformers for Early Exit Prediction with One Jump to Fit All Exit Levels
by: Seshadri, Amrit Diggavi
Published: (2025)
by: Seshadri, Amrit Diggavi
Published: (2025)
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)
by: Hu, Haoyu, et al.
Published: (2026)
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)
by: Shen, Yiqun, et al.
Published: (2025)
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
by: Liang, Yingyu, et al.
Published: (2024)
by: Liang, Yingyu, et al.
Published: (2024)
Understanding Dynamic Compute Allocation in Recurrent Transformers
by: Moosa, Ibraheem Muhammad, et al.
Published: (2026)
by: Moosa, Ibraheem Muhammad, et al.
Published: (2026)
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)
by: Liu, Baihui, et al.
Published: (2026)
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
by: Yao, Zhiyuan, et al.
Published: (2026)
by: Yao, Zhiyuan, et al.
Published: (2026)
GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets
by: Yao, Zhangyang, et al.
Published: (2026)
by: Yao, Zhangyang, et al.
Published: (2026)
Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning
by: Wan, Zhiyi, et al.
Published: (2025)
by: Wan, Zhiyi, et al.
Published: (2025)
BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference
by: Gulhan, Ahmed Burak, et al.
Published: (2025)
by: Gulhan, Ahmed Burak, et al.
Published: (2025)
BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
by: Wen, Hao, et al.
Published: (2025)
by: Wen, Hao, et al.
Published: (2025)
Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
by: Alomrani, Mohammad Ali, et al.
Published: (2025)
by: Alomrani, Mohammad Ali, et al.
Published: (2025)
Joint Optimization of Resource Allocation and Data Selection for Fast and Cost-Efficient Federated Edge Learning
by: Jia, Yunjian, et al.
Published: (2024)
by: Jia, Yunjian, et al.
Published: (2024)
Draft-Conditioned Constrained Decoding for Structured Generation in LLMs
by: Reddy, Avinash, et al.
Published: (2026)
by: Reddy, Avinash, et al.
Published: (2026)
Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)
by: Yan, Ruiqing, et al.
Published: (2024)
ZeroS: Zero-Sum Linear Attention for Efficient Transformers
by: Lu, Jiecheng, et al.
Published: (2026)
by: Lu, Jiecheng, et al.
Published: (2026)
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
by: Li, Ziniu, et al.
Published: (2025)
by: Li, Ziniu, et al.
Published: (2025)
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
by: Saxena, Krati, et al.
Published: (2025)
by: Saxena, Krati, et al.
Published: (2025)
LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers
by: Karmore, Aryan
Published: (2026)
by: Karmore, Aryan
Published: (2026)
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
by: Hou, Zhichao, et al.
Published: (2025)
by: Hou, Zhichao, et al.
Published: (2025)
To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks
by: Raina, Rashika, et al.
Published: (2025)
by: Raina, Rashika, et al.
Published: (2025)
DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs
by: Song, Mingxuan, et al.
Published: (2026)
by: Song, Mingxuan, et al.
Published: (2026)
Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use
by: Liu, Hanbing, et al.
Published: (2026)
by: Liu, Hanbing, et al.
Published: (2026)
Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation
by: Patel, Bhrij, et al.
Published: (2023)
by: Patel, Bhrij, et al.
Published: (2023)
Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)
by: Fu, Zichuan, et al.
Published: (2026)
ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning
by: Liang, Kun, et al.
Published: (2026)
by: Liang, Kun, et al.
Published: (2026)
Fine-Grained Graph Generation through Latent Mixture Scheduling
by: Vakil, Nidhi, et al.
Published: (2026)
by: Vakil, Nidhi, et al.
Published: (2026)
VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)
by: Zhou, Jingbo, et al.
Published: (2026)
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
by: Deiseroth, Björn, et al.
Published: (2023)
by: Deiseroth, Björn, et al.
Published: (2023)
Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs
by: Rottoli, Michael, et al.
Published: (2026)
by: Rottoli, Michael, et al.
Published: (2026)
Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
by: Brahmanage, Janaka Chathuranga, et al.
Published: (2026)
by: Brahmanage, Janaka Chathuranga, et al.
Published: (2026)
The Bayesian Geometry of Transformer Attention
by: Agarwal, Naman, et al.
Published: (2025)
by: Agarwal, Naman, et al.
Published: (2025)
Signature-Informed Transformer for Asset Allocation
by: Hwang, Yoontae, et al.
Published: (2025)
by: Hwang, Yoontae, et al.
Published: (2025)
Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling
by: Yi, Yungang, et al.
Published: (2026)
by: Yi, Yungang, et al.
Published: (2026)
Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction
by: You, Junwei, et al.
Published: (2024)
by: You, Junwei, et al.
Published: (2024)
Do Efficient Transformers Really Save Computation?
by: Yang, Kai, et al.
Published: (2024)
by: Yang, Kai, et al.
Published: (2024)
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
by: Jali, Neharika, et al.
Published: (2026)
by: Jali, Neharika, et al.
Published: (2026)
Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
by: Freytes, Luis Rosario
Published: (2026)
by: Freytes, Luis Rosario
Published: (2026)
Similar Items
-
Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference
by: Sabry, Mohammed, et al.
Published: (2026) -
Conformal Thinking: Risk Control for Reasoning on a Compute Budget
by: Wang, Xi, et al.
Published: (2026) -
One Jump Is All You Need: Short-Cutting Transformers for Early Exit Prediction with One Jump to Fit All Exit Levels
by: Seshadri, Amrit Diggavi
Published: (2025) -
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026) -
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)