Saved in:
| Main Author: | Shah, Harsh |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.27641 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
by: Shukla, Shikhar
Published: (2026)
by: Shukla, Shikhar
Published: (2026)
ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025)
by: Wang, Yixuan, et al.
Published: (2025)
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
by: Xiao, Zilin, et al.
Published: (2024)
by: Xiao, Zilin, et al.
Published: (2024)
State Space Models as Foundation Models: A Control Theoretic Overview
by: Alonso, Carmen Amo, et al.
Published: (2024)
by: Alonso, Carmen Amo, et al.
Published: (2024)
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
by: Emami, Patrick, et al.
Published: (2024)
by: Emami, Patrick, et al.
Published: (2024)
STS: Efficient Sparse Attention with Speculative Token Sparsity
by: Xu, Ceyu, et al.
Published: (2026)
by: Xu, Ceyu, et al.
Published: (2026)
HiSpec: Hierarchical Speculative Decoding for LLMs
by: Kumar, Avinash, et al.
Published: (2025)
by: Kumar, Avinash, et al.
Published: (2025)
AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache
by: Song, Dinghong, et al.
Published: (2025)
by: Song, Dinghong, et al.
Published: (2025)
ACING: Actor-Critic for Instruction Learning in Black-Box LLMs
by: Kharrat, Salma, et al.
Published: (2024)
by: Kharrat, Salma, et al.
Published: (2024)
ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise
by: Guo, Xingang, et al.
Published: (2024)
by: Guo, Xingang, et al.
Published: (2024)
Fake-Mamba: Real-Time Speech Deepfake Detection Using Bidirectional Mamba as Self-Attention's Alternative
by: Xuan, Xi, et al.
Published: (2025)
by: Xuan, Xi, et al.
Published: (2025)
SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding
by: Plaksin, Anton, et al.
Published: (2026)
by: Plaksin, Anton, et al.
Published: (2026)
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
by: Zhou, Yongchao, et al.
Published: (2023)
by: Zhou, Yongchao, et al.
Published: (2023)
LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics
by: Gokhale, Anand, et al.
Published: (2025)
by: Gokhale, Anand, et al.
Published: (2025)
Fine-tuning Smaller Language Models for Question Answering over Financial Documents
by: Phogat, Karmvir Singh, et al.
Published: (2024)
by: Phogat, Karmvir Singh, et al.
Published: (2024)
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
by: Chen, Lingjiao, et al.
Published: (2024)
by: Chen, Lingjiao, et al.
Published: (2024)
Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement
by: Xie, Guanwen, et al.
Published: (2024)
by: Xie, Guanwen, et al.
Published: (2024)
PreFT: Prefill-only finetuning for efficient inference
by: Lanpouthakoun, Andrew, et al.
Published: (2026)
by: Lanpouthakoun, Andrew, et al.
Published: (2026)
Most Likely Sequence Generation for $n$-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms
by: Li, Yuchao, et al.
Published: (2024)
by: Li, Yuchao, et al.
Published: (2024)
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
by: Bogdanov, Igor, et al.
Published: (2026)
by: Bogdanov, Igor, et al.
Published: (2026)
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
by: Bogdanov, Igor, et al.
Published: (2026)
by: Bogdanov, Igor, et al.
Published: (2026)
Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins
by: Yin, Yanlei, et al.
Published: (2024)
by: Yin, Yanlei, et al.
Published: (2024)
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
by: Georganas, Evangelos, et al.
Published: (2025)
by: Georganas, Evangelos, et al.
Published: (2025)
SpecExit: Accelerating Large Reasoning Model via Speculative Exit
by: Yang, Rubing, et al.
Published: (2025)
by: Yang, Rubing, et al.
Published: (2025)
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
by: Huang, Kaixuan, et al.
Published: (2024)
by: Huang, Kaixuan, et al.
Published: (2024)
Multi-Bin Batching for Increasing LLM Inference Throughput
by: Guldogan, Ozgur, et al.
Published: (2024)
by: Guldogan, Ozgur, et al.
Published: (2024)
Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations
by: Sitdhipol, Supawich, et al.
Published: (2025)
by: Sitdhipol, Supawich, et al.
Published: (2025)
On the Relation of State Space Models and Hidden Markov Models
by: Ghojogh, Aydin, et al.
Published: (2026)
by: Ghojogh, Aydin, et al.
Published: (2026)
A Survey on Large Language Model-empowered Autonomous Driving
by: Zhu, Yuxuan, et al.
Published: (2024)
by: Zhu, Yuxuan, et al.
Published: (2024)
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
by: Yang, Penghui, et al.
Published: (2025)
by: Yang, Penghui, et al.
Published: (2025)
SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration
by: Wen, Zhuofan, et al.
Published: (2026)
by: Wen, Zhuofan, et al.
Published: (2026)
Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations
by: Wang, Yanwei, et al.
Published: (2022)
by: Wang, Yanwei, et al.
Published: (2022)
DynaSpec: Context-aware Dynamic Speculative Sampling for Large-Vocabulary Language Models
by: Zhang, Jinbin, et al.
Published: (2025)
by: Zhang, Jinbin, et al.
Published: (2025)
SpecTr: Fast Speculative Decoding via Optimal Transport
by: Sun, Ziteng, et al.
Published: (2023)
by: Sun, Ziteng, et al.
Published: (2023)
SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
by: Li, Shenggui, et al.
Published: (2026)
by: Li, Shenggui, et al.
Published: (2026)
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
by: Achtibat, Reduan, et al.
Published: (2024)
by: Achtibat, Reduan, et al.
Published: (2024)
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
by: Mitra, Purbesh, et al.
Published: (2025)
by: Mitra, Purbesh, et al.
Published: (2025)
FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation
by: Mitra, Shaswata, et al.
Published: (2025)
by: Mitra, Shaswata, et al.
Published: (2025)
A Unified Generative-AI Framework for Smart Energy Infrastructure: Intelligent Gas Distribution, Utility Billing, Carbon Analytics, and Quantum-Inspired Optimisation
by: Manjunath, Pavan, et al.
Published: (2026)
by: Manjunath, Pavan, et al.
Published: (2026)
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
by: Liu, Qin, et al.
Published: (2024)
by: Liu, Qin, et al.
Published: (2024)
Similar Items
-
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
by: Shukla, Shikhar
Published: (2026) -
ProxyAttn: Guided Sparse Attention via Representative Heads
by: Wang, Yixuan, et al.
Published: (2025) -
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
by: Xiao, Zilin, et al.
Published: (2024) -
State Space Models as Foundation Models: A Control Theoretic Overview
by: Alonso, Carmen Amo, et al.
Published: (2024) -
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
by: Emami, Patrick, et al.
Published: (2024)