:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Nidhi, Amrit
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.05697
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference
by: Sabry, Mohammed, et al.
Published: (2026)

Conformal Thinking: Risk Control for Reasoning on a Compute Budget
by: Wang, Xi, et al.
Published: (2026)

One Jump Is All You Need: Short-Cutting Transformers for Early Exit Prediction with One Jump to Fit All Exit Levels
by: Seshadri, Amrit Diggavi
Published: (2025)

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
by: Liang, Yingyu, et al.
Published: (2024)

Understanding Dynamic Compute Allocation in Recurrent Transformers
by: Moosa, Ibraheem Muhammad, et al.
Published: (2026)

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
by: Yao, Zhiyuan, et al.
Published: (2026)

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets
by: Yao, Zhangyang, et al.
Published: (2026)

Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning
by: Wan, Zhiyi, et al.
Published: (2025)

BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference
by: Gulhan, Ahmed Burak, et al.
Published: (2025)

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
by: Wen, Hao, et al.
Published: (2025)

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
by: Alomrani, Mohammad Ali, et al.
Published: (2025)

Joint Optimization of Resource Allocation and Data Selection for Fast and Cost-Efficient Federated Edge Learning
by: Jia, Yunjian, et al.
Published: (2024)

Draft-Conditioned Constrained Decoding for Structured Generation in LLMs
by: Reddy, Avinash, et al.
Published: (2026)

Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)

ZeroS: Zero-Sum Linear Attention for Efficient Transformers
by: Lu, Jiecheng, et al.
Published: (2026)

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
by: Li, Ziniu, et al.
Published: (2025)

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
by: Saxena, Krati, et al.
Published: (2025)

LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers
by: Karmore, Aryan
Published: (2026)

Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
by: Hou, Zhichao, et al.
Published: (2025)

To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks
by: Raina, Rashika, et al.
Published: (2025)

DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs
by: Song, Mingxuan, et al.
Published: (2026)

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use
by: Liu, Hanbing, et al.
Published: (2026)

Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation
by: Patel, Bhrij, et al.
Published: (2023)

Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning
by: Liang, Kun, et al.
Published: (2026)

Fine-Grained Graph Generation through Latent Mixture Scheduling
by: Vakil, Nidhi, et al.
Published: (2026)

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
by: Deiseroth, Björn, et al.
Published: (2023)

Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs
by: Rottoli, Michael, et al.
Published: (2026)

Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
by: Brahmanage, Janaka Chathuranga, et al.
Published: (2026)

The Bayesian Geometry of Transformer Attention
by: Agarwal, Naman, et al.
Published: (2025)

Signature-Informed Transformer for Asset Allocation
by: Hwang, Yoontae, et al.
Published: (2025)

Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling
by: Yi, Yungang, et al.
Published: (2026)

Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction
by: You, Junwei, et al.
Published: (2024)

Do Efficient Transformers Really Save Computation?
by: Yang, Kai, et al.
Published: (2024)

Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
by: Jali, Neharika, et al.
Published: (2026)

Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
by: Freytes, Luis Rosario
Published: (2026)