:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Belitsky, Max, Kopiczko, Dawid J., Dorkenwald, Michael, Mirza, M. Jehanzeb, Glass, James R., Snoek, Cees G. M., Asano, Yuki M.
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.08799
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What Layers When: Learning to Skip Compute in LLMs with Residual Gates
by: Laitenberger, Filipe, et al.
Published: (2025)

Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
by: Kopiczko, Dawid J., et al.
Published: (2024)

VeRA: Vector-based Random Matrix Adaptation
by: Kopiczko, Dawid J., et al.
Published: (2023)

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
by: Dorkenwald, Michael, et al.
Published: (2024)

Lost in Time: A New Temporal Benchmark for VideoLLMs
by: Cores, Daniel, et al.
Published: (2024)

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
by: Kopiczko, Dawid J., et al.
Published: (2026)

Elastic ViTs from Pretrained Models without Retraining
by: Simoncini, Walter, et al.
Published: (2025)

SIGMA: Sinkhorn-Guided Masked Video Modeling
by: Salehi, Mohammadreza, et al.
Published: (2024)

SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
by: Rastegar, Sarah, et al.
Published: (2024)

NeoBabel: A Multilingual Open Tower for Visual Generation
by: Derakhshani, Mohammad Mahdi, et al.
Published: (2025)

Overflow Prevention Enhances Long-Context Recurrent LLMs
by: Ben-Kish, Assaf, et al.
Published: (2025)

Beyond Model Adaptation at Test Time: A Survey
by: Xiao, Zehao, et al.
Published: (2024)

CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models
by: Mehta, Videet, et al.
Published: (2026)

Beyond KV Caching: Shared Attention for Efficient LLMs
by: Liao, Bingli, et al.
Published: (2024)

Segment Any 3D-Part in a Scene from a Sentence
by: Wu, Hongyu, et al.
Published: (2025)

Attention Is All You Need for KV Cache in Diffusion LLMs
by: Nguyen-Tri, Quan, et al.
Published: (2025)

Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
by: Wang, Zihan, et al.
Published: (2026)

IPO: Interpretable Prompt Optimization for Vision-Language Models
by: Du, Yingjun, et al.
Published: (2024)

Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)

G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)

Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
by: Liu, Andy Zeyi, et al.
Published: (2026)

ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)

Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
by: Ghadia, Ravi, et al.
Published: (2025)

HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
by: Yang, Dongquan, et al.
Published: (2025)

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)

LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)

RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
by: Su, Zunhai, et al.
Published: (2025)

Towards Threshold-Free KV Cache Pruning
by: Ni, Xuanfan, et al.
Published: (2025)

KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
by: Gupta, Manan, et al.
Published: (2026)

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
by: Liu, Huabin, et al.
Published: (2025)

DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)

DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity
by: Hao, Jitai, et al.
Published: (2026)

Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity
by: Poudel, Pratik
Published: (2025)

OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025)

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
by: Feng, Yuan, et al.
Published: (2024)

Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection
by: Salehi, Mohammadreza, et al.
Published: (2024)

KV Cache Offloading for Context-Intensive Tasks
by: Bocharnikov, Andrey, et al.
Published: (2026)