Saved in:
| Main Author: | Bansal, Harsh Vardhan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.16843 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024)
by: Yang, Yifei, et al.
Published: (2024)
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
by: Gim, In, et al.
Published: (2023)
by: Gim, In, et al.
Published: (2023)
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)
by: Li, Xing, et al.
Published: (2025)
Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
by: Taniguchi, Rei, et al.
Published: (2026)
by: Taniguchi, Rei, et al.
Published: (2026)
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
by: Wang, Jiahao, et al.
Published: (2026)
by: Wang, Jiahao, et al.
Published: (2026)
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
by: Sung, Yi-Lin, et al.
Published: (2023)
by: Sung, Yi-Lin, et al.
Published: (2023)
KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
by: Yang, Huan, et al.
Published: (2025)
by: Yang, Huan, et al.
Published: (2025)
Layer-Wise Evolution of Representations in Fine-Tuned Transformers: Insights from Sparse AutoEncoders
by: Nadipalli, Suneel
Published: (2025)
by: Nadipalli, Suneel
Published: (2025)
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization
by: Song, Shezheng, et al.
Published: (2025)
by: Song, Shezheng, et al.
Published: (2025)
Generative Caching for Structurally Similar Prompts and Responses
by: Chakraborty, Sarthak, et al.
Published: (2025)
by: Chakraborty, Sarthak, et al.
Published: (2025)
Fine-grained Narrative Classification in Biased News Articles
by: Afroz, Zeba, et al.
Published: (2025)
by: Afroz, Zeba, et al.
Published: (2025)
Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers
by: Ye, Donald
Published: (2026)
by: Ye, Donald
Published: (2026)
KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)
by: Staniszewski, Konrad, et al.
Published: (2025)
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
by: Monteiro, João, et al.
Published: (2024)
by: Monteiro, João, et al.
Published: (2024)
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
by: Ou, Jiao, et al.
Published: (2024)
by: Ou, Jiao, et al.
Published: (2024)
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
by: Achtibat, Reduan, et al.
Published: (2024)
by: Achtibat, Reduan, et al.
Published: (2024)
Transformer-Based Extraction of Statutory Definitions from the U.S. Code
by: Hosabettu, Arpana, et al.
Published: (2025)
by: Hosabettu, Arpana, et al.
Published: (2025)
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025)
by: Shu, Huizhen, et al.
Published: (2025)
WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing
by: Hu, Chenhui, et al.
Published: (2024)
by: Hu, Chenhui, et al.
Published: (2024)
One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)
by: Lu, Liming, et al.
Published: (2026)
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
by: Cui, Hanshuai, et al.
Published: (2025)
by: Cui, Hanshuai, et al.
Published: (2025)
Crown, Frame, Reverse: Layer-Wise Scaling Variants for LLM Pre-Training
by: Baroian, Andrei, et al.
Published: (2025)
by: Baroian, Andrei, et al.
Published: (2025)
HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
by: Shi, Zhiyuan, et al.
Published: (2026)
by: Shi, Zhiyuan, et al.
Published: (2026)
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
by: Mersha, Melkamu Abay, et al.
Published: (2026)
by: Mersha, Melkamu Abay, et al.
Published: (2026)
Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation
by: Yalavarthi, Surya Vardhan
Published: (2026)
by: Yalavarthi, Surya Vardhan
Published: (2026)
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
by: Liu, Zhiyuan, et al.
Published: (2025)
by: Liu, Zhiyuan, et al.
Published: (2025)
LPCD: Unified Framework from Layer-Wise to Submodule Quantization
by: Ichikawa, Yuma, et al.
Published: (2025)
by: Ichikawa, Yuma, et al.
Published: (2025)
Accelerating Transformer Inference for Translation via Parallel Decoding
by: Santilli, Andrea, et al.
Published: (2023)
by: Santilli, Andrea, et al.
Published: (2023)
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
by: Tu, Dezhan, et al.
Published: (2024)
by: Tu, Dezhan, et al.
Published: (2024)
Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
by: Mai, Tho, et al.
Published: (2026)
by: Mai, Tho, et al.
Published: (2026)
Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
by: Shah, Syed Huma
Published: (2026)
by: Shah, Syed Huma
Published: (2026)
Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
by: Sim, Woo Seob, et al.
Published: (2026)
by: Sim, Woo Seob, et al.
Published: (2026)
ChatWise: A Strategy-Guided Chatbot for Enhancing Cognitive Support in Older Adults
by: Yang, Zhengbang, et al.
Published: (2025)
by: Yang, Zhengbang, et al.
Published: (2025)
Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers
by: Pathak, Harsh Nilesh, et al.
Published: (2025)
by: Pathak, Harsh Nilesh, et al.
Published: (2025)
WorldCache: Content-Aware Caching for Accelerated Video World Models
by: Nawaz, Umair, et al.
Published: (2026)
by: Nawaz, Umair, et al.
Published: (2026)
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
by: He, Junhui, et al.
Published: (2024)
by: He, Junhui, et al.
Published: (2024)
LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
by: Kapadia, Shashank, et al.
Published: (2026)
by: Kapadia, Shashank, et al.
Published: (2026)
OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
by: Gu, Yuzhe, et al.
Published: (2025)
by: Gu, Yuzhe, et al.
Published: (2025)
Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis
by: Vardhan, Mangadoddi Srikar, et al.
Published: (2026)
by: Vardhan, Mangadoddi Srikar, et al.
Published: (2026)
Similar Items
-
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024) -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
by: Gim, In, et al.
Published: (2023) -
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025) -
Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
by: Taniguchi, Rei, et al.
Published: (2026) -
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
by: Wang, Jiahao, et al.
Published: (2026)