Saved in:
| Main Authors: | Jaiswal, Ajay, Hannah, Lauren, Kim, Han-Byul, Hoang, Duc, Kundu, Arnav, Farajtabar, Mehrdad, Cho, Minsik |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00398 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TIDE: Every Layer Knows the Token Beneath the Context
by: Jaiswal, Ajay, et al.
Published: (2026)
by: Jaiswal, Ajay, et al.
Published: (2026)
SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
by: Kim, Han-Byul, et al.
Published: (2025)
by: Kim, Han-Byul, et al.
Published: (2025)
EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments
by: Kim, Minsoo, et al.
Published: (2025)
by: Kim, Minsoo, et al.
Published: (2025)
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)
by: Hannah, Lauren. A, et al.
Published: (2025)
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
by: Samragh, Mohammad, et al.
Published: (2025)
by: Samragh, Mohammad, et al.
Published: (2025)
SpecMD: A Comprehensive Study On Speculative Expert Prefetching
by: Hoang, Duc, et al.
Published: (2026)
by: Hoang, Duc, et al.
Published: (2026)
Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
by: Armandpour, Mohammadreza, et al.
Published: (2026)
by: Armandpour, Mohammadreza, et al.
Published: (2026)
Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context
by: Alizadeh, Keivan, et al.
Published: (2026)
by: Alizadeh, Keivan, et al.
Published: (2026)
M+: Extending MemoryLLM with Scalable Long-Term Memory
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
by: Alizadeh, Keivan, et al.
Published: (2023)
by: Alizadeh, Keivan, et al.
Published: (2023)
MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE
by: Zibakhsh, Soheil, et al.
Published: (2025)
by: Zibakhsh, Soheil, et al.
Published: (2025)
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
by: Bhendawade, Nikhil, et al.
Published: (2025)
by: Bhendawade, Nikhil, et al.
Published: (2025)
R2 Loss: Range Restriction Loss for Model Compression and Quantization
by: Kundu, Arnav, et al.
Published: (2023)
by: Kundu, Arnav, et al.
Published: (2023)
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
by: Alizadeh, Keivan, et al.
Published: (2024)
by: Alizadeh, Keivan, et al.
Published: (2024)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
by: Jaiswal, Ajay, et al.
Published: (2024)
by: Jaiswal, Ajay, et al.
Published: (2024)
TS-Memory: Plug-and-Play Memory for Time Series Foundation Models
by: Lyu, Sisuo, et al.
Published: (2026)
by: Lyu, Sisuo, et al.
Published: (2026)
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
by: Samragh, Mohammad, et al.
Published: (2024)
by: Samragh, Mohammad, et al.
Published: (2024)
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
by: Cao, Jiaqi, et al.
Published: (2025)
by: Cao, Jiaqi, et al.
Published: (2025)
From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs
by: Nishu, Kumari, et al.
Published: (2025)
by: Nishu, Kumari, et al.
Published: (2025)
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
by: Hoang, Duc N. M, et al.
Published: (2023)
by: Hoang, Duc N. M, et al.
Published: (2023)
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
by: Sarkar, Aditya, et al.
Published: (2026)
by: Sarkar, Aditya, et al.
Published: (2026)
NGM: A Plug-and-Play Training-Free Memory Module for LLMs
by: Qu, Yuwen, et al.
Published: (2026)
by: Qu, Yuwen, et al.
Published: (2026)
Self-supervised Deep Hyperspectral Inpainting with the Plug and Play and Deep Image Prior Models
by: Li, Shuo, et al.
Published: (2025)
by: Li, Shuo, et al.
Published: (2025)
Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms
by: Kowalski, Matthieu, et al.
Published: (2024)
by: Kowalski, Matthieu, et al.
Published: (2024)
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
by: Sarawgi, Utkarsh Oggy, et al.
Published: (2023)
by: Sarawgi, Utkarsh Oggy, et al.
Published: (2023)
Romanization-Induced Mispronunciations in Korean: How Latin Letters Alter the Perception of Japanese Voiceless Consonants
by: Kang, Byul
Published: (2025)
by: Kang, Byul
Published: (2025)
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
by: Cho, Minsik, et al.
Published: (2024)
by: Cho, Minsik, et al.
Published: (2024)
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
by: Yang, Ke, et al.
Published: (2026)
by: Yang, Ke, et al.
Published: (2026)
Uniform boundedness on rational maps with automorphisms
by: Han, Minsik
Published: (2024)
by: Han, Minsik
Published: (2024)
A Study of Student Dependency on Artificial Intelligence Applications in their Education: With Reference to Indore City
by: Ajay Jaiswal
Published: (2025)
by: Ajay Jaiswal
Published: (2025)
Topological transition as a percolation of the Berry curvature
by: Kim, Han-Byul, et al.
Published: (2024)
by: Kim, Han-Byul, et al.
Published: (2024)
PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models
by: Kim, HyunJin, et al.
Published: (2023)
by: Kim, HyunJin, et al.
Published: (2023)
Online Temporal Action Localization with Memory-Augmented Transformer
by: Song, Youngkil, et al.
Published: (2024)
by: Song, Youngkil, et al.
Published: (2024)
MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service
by: Huang, Yizhe, et al.
Published: (2025)
by: Huang, Yizhe, et al.
Published: (2025)
Towards Low-bit Communication for Tensor Parallel LLM Inference
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
Safe Memory Reclamation Techniques
by: Singh, Ajay
Published: (2025)
by: Singh, Ajay
Published: (2025)
F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting
by: Kim, Injae, et al.
Published: (2026)
by: Kim, Injae, et al.
Published: (2026)
ProTransformer: Robustify Transformers via Plug-and-Play Paradigm
by: Hou, Zhichao, et al.
Published: (2024)
by: Hou, Zhichao, et al.
Published: (2024)
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
by: Jeong, Yoonwoo, et al.
Published: (2023)
by: Jeong, Yoonwoo, et al.
Published: (2023)
Plug-and-Play Transformer Modules for Test-Time Adaptation
by: Chang, Xiangyu, et al.
Published: (2024)
by: Chang, Xiangyu, et al.
Published: (2024)
Similar Items
-
TIDE: Every Layer Knows the Token Beneath the Context
by: Jaiswal, Ajay, et al.
Published: (2026) -
SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
by: Kim, Han-Byul, et al.
Published: (2025) -
EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments
by: Kim, Minsoo, et al.
Published: (2025) -
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025) -
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
by: Samragh, Mohammad, et al.
Published: (2025)