:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Bansal, Harsh Vardhan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.16843
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing
by: Yang, Yifei, et al.
Published: (2024)

Prompt Cache: Modular Attention Reuse for Low-Latency Inference
by: Gim, In, et al.
Published: (2023)

KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
by: Li, Xing, et al.
Published: (2025)

Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
by: Taniguchi, Rei, et al.
Published: (2026)

From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
by: Wang, Jiahao, et al.
Published: (2026)

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
by: Sung, Yi-Lin, et al.
Published: (2023)

KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse
by: Yang, Huan, et al.
Published: (2025)

Layer-Wise Evolution of Representations in Fine-Tuned Transformers: Insights from Sparse AutoEncoders
by: Nadipalli, Suneel
Published: (2025)

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)

How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization
by: Song, Shezheng, et al.
Published: (2025)

Generative Caching for Structurally Similar Prompts and Responses
by: Chakraborty, Sarthak, et al.
Published: (2025)

Fine-grained Narrative Classification in Biased News Articles
by: Afroz, Zeba, et al.
Published: (2025)

Hidden Heroes and Gradient Bloats: Layer-Wise Redundancy Inverts Attribution in Transformers
by: Ye, Donald
Published: (2026)

KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
by: Monteiro, João, et al.
Published: (2024)

Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
by: Ou, Jiao, et al.
Published: (2024)

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
by: Achtibat, Reduan, et al.
Published: (2024)

Transformer-Based Extraction of Statutory Definitions from the U.S. Code
by: Hosabettu, Arpana, et al.
Published: (2025)

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025)

WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing
by: Hu, Chenhui, et al.
Published: (2024)

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
by: Cui, Hanshuai, et al.
Published: (2025)

Crown, Frame, Reverse: Layer-Wise Scaling Variants for LLM Pre-Training
by: Baroian, Andrei, et al.
Published: (2025)

HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
by: Shi, Zhiyuan, et al.
Published: (2026)

Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
by: Mersha, Melkamu Abay, et al.
Published: (2026)

Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation
by: Yalavarthi, Surya Vardhan
Published: (2026)

dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
by: Liu, Zhiyuan, et al.
Published: (2025)

LPCD: Unified Framework from Layer-Wise to Submodule Quantization
by: Ichikawa, Yuma, et al.
Published: (2025)

Accelerating Transformer Inference for Translation via Parallel Decoding
by: Santilli, Andrea, et al.
Published: (2023)

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
by: Tu, Dezhan, et al.
Published: (2024)

Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
by: Mai, Tho, et al.
Published: (2026)

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
by: Shah, Syed Huma
Published: (2026)

Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
by: Sim, Woo Seob, et al.
Published: (2026)

ChatWise: A Strategy-Guided Chatbot for Enhancing Cognitive Support in Older Adults
by: Yang, Zhengbang, et al.
Published: (2025)

Solo Connection: A Parameter Efficient Fine-Tuning Technique for Transformers
by: Pathak, Harsh Nilesh, et al.
Published: (2025)

WorldCache: Content-Aware Caching for Accelerated Video World Models
by: Nawaz, Umair, et al.
Published: (2026)

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
by: He, Junhui, et al.
Published: (2024)

LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
by: Kapadia, Shashank, et al.
Published: (2026)

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference
by: Gu, Yuzhe, et al.
Published: (2025)

Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis
by: Vardhan, Mangadoddi Srikar, et al.
Published: (2026)