:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jaiswal, Ajay, Hannah, Lauren, Kim, Han-Byul, Hoang, Duc, Kundu, Arnav, Farajtabar, Mehrdad, Cho, Minsik
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.00398
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TIDE: Every Layer Knows the Token Beneath the Context
by: Jaiswal, Ajay, et al.
Published: (2026)

SPD: Sync-Point Drop for Efficient Tensor Parallelism of Large Language Models
by: Kim, Han-Byul, et al.
Published: (2025)

EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments
by: Kim, Minsoo, et al.
Published: (2025)

MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential
by: Samragh, Mohammad, et al.
Published: (2025)

SpecMD: A Comprehensive Study On Speculative Expert Prefetching
by: Hoang, Duc, et al.
Published: (2026)

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why
by: Armandpour, Mohammadreza, et al.
Published: (2026)

Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context
by: Alizadeh, Keivan, et al.
Published: (2026)

M+: Extending MemoryLLM with Scalable Long-Term Memory
by: Wang, Yu, et al.
Published: (2025)

LLM in a flash: Efficient Large Language Model Inference with Limited Memory
by: Alizadeh, Keivan, et al.
Published: (2023)

MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE
by: Zibakhsh, Soheil, et al.
Published: (2025)

Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
by: Bhendawade, Nikhil, et al.
Published: (2025)

R2 Loss: Range Restriction Loss for Model Compression and Quantization
by: Kundu, Arnav, et al.
Published: (2023)

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
by: Alizadeh, Keivan, et al.
Published: (2024)

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
by: Jaiswal, Ajay, et al.
Published: (2024)

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models
by: Lyu, Sisuo, et al.
Published: (2026)

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
by: Samragh, Mohammad, et al.
Published: (2024)

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
by: Cao, Jiaqi, et al.
Published: (2025)

From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs
by: Nishu, Kumari, et al.
Published: (2025)

Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
by: Hoang, Duc N. M, et al.
Published: (2023)

Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
by: Sarkar, Aditya, et al.
Published: (2026)

NGM: A Plug-and-Play Training-Free Memory Module for LLMs
by: Qu, Yuwen, et al.
Published: (2026)

Self-supervised Deep Hyperspectral Inpainting with the Plug and Play and Deep Image Prior Models
by: Li, Shuo, et al.
Published: (2025)

Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms
by: Kowalski, Matthieu, et al.
Published: (2024)

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
by: Sarawgi, Utkarsh Oggy, et al.
Published: (2023)

Romanization-Induced Mispronunciations in Korean: How Latin Letters Alter the Perception of Japanese Voiceless Consonants
by: Kang, Byul
Published: (2025)

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
by: Cho, Minsik, et al.
Published: (2024)

PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
by: Yang, Ke, et al.
Published: (2026)

Uniform boundedness on rational maps with automorphisms
by: Han, Minsik
Published: (2024)

A Study of Student Dependency on Artificial Intelligence Applications in their Education: With Reference to Indore City
by: Ajay Jaiswal
Published: (2025)

Topological transition as a percolation of the Berry curvature
by: Kim, Han-Byul, et al.
Published: (2024)

PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models
by: Kim, HyunJin, et al.
Published: (2023)

Online Temporal Action Localization with Memory-Augmented Transformer
by: Song, Youngkil, et al.
Published: (2024)

MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service
by: Huang, Yizhe, et al.
Published: (2025)

Towards Low-bit Communication for Tensor Parallel LLM Inference
by: Dong, Harry, et al.
Published: (2024)

Safe Memory Reclamation Techniques
by: Singh, Ajay
Published: (2025)

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting
by: Kim, Injae, et al.
Published: (2026)

ProTransformer: Robustify Transformers via Plug-and-Play Paradigm
by: Hou, Zhichao, et al.
Published: (2024)

NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
by: Jeong, Yoonwoo, et al.
Published: (2023)

Plug-and-Play Transformer Modules for Test-Time Adaptation
by: Chang, Xiangyu, et al.
Published: (2024)