Saved in:
| Main Authors: | Belitsky, Max, Kopiczko, Dawid J., Dorkenwald, Michael, Mirza, M. Jehanzeb, Glass, James R., Snoek, Cees G. M., Asano, Yuki M. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.08799 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
by: Laitenberger, Filipe, et al.
Published: (2025)
by: Laitenberger, Filipe, et al.
Published: (2025)
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
by: Kopiczko, Dawid J., et al.
Published: (2024)
by: Kopiczko, Dawid J., et al.
Published: (2024)
VeRA: Vector-based Random Matrix Adaptation
by: Kopiczko, Dawid J., et al.
Published: (2023)
by: Kopiczko, Dawid J., et al.
Published: (2023)
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
by: Dorkenwald, Michael, et al.
Published: (2024)
by: Dorkenwald, Michael, et al.
Published: (2024)
Lost in Time: A New Temporal Benchmark for VideoLLMs
by: Cores, Daniel, et al.
Published: (2024)
by: Cores, Daniel, et al.
Published: (2024)
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
by: Kopiczko, Dawid J., et al.
Published: (2026)
by: Kopiczko, Dawid J., et al.
Published: (2026)
Elastic ViTs from Pretrained Models without Retraining
by: Simoncini, Walter, et al.
Published: (2025)
by: Simoncini, Walter, et al.
Published: (2025)
SIGMA: Sinkhorn-Guided Masked Video Modeling
by: Salehi, Mohammadreza, et al.
Published: (2024)
by: Salehi, Mohammadreza, et al.
Published: (2024)
SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
by: Rastegar, Sarah, et al.
Published: (2024)
by: Rastegar, Sarah, et al.
Published: (2024)
NeoBabel: A Multilingual Open Tower for Visual Generation
by: Derakhshani, Mohammad Mahdi, et al.
Published: (2025)
by: Derakhshani, Mohammad Mahdi, et al.
Published: (2025)
Overflow Prevention Enhances Long-Context Recurrent LLMs
by: Ben-Kish, Assaf, et al.
Published: (2025)
by: Ben-Kish, Assaf, et al.
Published: (2025)
Beyond Model Adaptation at Test Time: A Survey
by: Xiao, Zehao, et al.
Published: (2024)
by: Xiao, Zehao, et al.
Published: (2024)
CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models
by: Mehta, Videet, et al.
Published: (2026)
by: Mehta, Videet, et al.
Published: (2026)
Beyond KV Caching: Shared Attention for Efficient LLMs
by: Liao, Bingli, et al.
Published: (2024)
by: Liao, Bingli, et al.
Published: (2024)
Segment Any 3D-Part in a Scene from a Sentence
by: Wu, Hongyu, et al.
Published: (2025)
by: Wu, Hongyu, et al.
Published: (2025)
Attention Is All You Need for KV Cache in Diffusion LLMs
by: Nguyen-Tri, Quan, et al.
Published: (2025)
by: Nguyen-Tri, Quan, et al.
Published: (2025)
Crystal-KV: Efficient KV Cache Management for Chain-of-Thought LLMs via Answer-First Principle
by: Wang, Zihan, et al.
Published: (2026)
by: Wang, Zihan, et al.
Published: (2026)
IPO: Interpretable Prompt Optimization for Vision-Language Models
by: Du, Yingjun, et al.
Published: (2024)
by: Du, Yingjun, et al.
Published: (2024)
Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)
by: Yang, Zhen, et al.
Published: (2024)
G-KV: Decoding-Time KV Cache Eviction with Global Attention
by: Liao, Mengqi, et al.
Published: (2025)
by: Liao, Mengqi, et al.
Published: (2025)
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
by: Liu, Andy Zeyi, et al.
Published: (2026)
by: Liu, Andy Zeyi, et al.
Published: (2026)
ZSMerge: Zero-Shot KV Cache Compression for Memory-Efficient Long-Context LLMs
by: Liu, Xin, et al.
Published: (2025)
by: Liu, Xin, et al.
Published: (2025)
Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
by: Ghadia, Ravi, et al.
Published: (2025)
by: Ghadia, Ravi, et al.
Published: (2025)
HCAttention: Extreme KV Cache Compression via Heterogeneous Attention Computing for LLMs
by: Yang, Dongquan, et al.
Published: (2025)
by: Yang, Dongquan, et al.
Published: (2025)
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)
by: Cai, Zefan, et al.
Published: (2024)
LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)
by: Doughty, Hazel, et al.
Published: (2024)
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations
by: Su, Zunhai, et al.
Published: (2025)
by: Su, Zunhai, et al.
Published: (2025)
Towards Threshold-Free KV Cache Pruning
by: Ni, Xuanfan, et al.
Published: (2025)
by: Ni, Xuanfan, et al.
Published: (2025)
KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)
by: Jiang, Bo, et al.
Published: (2026)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
by: Gupta, Manan, et al.
Published: (2026)
by: Gupta, Manan, et al.
Published: (2026)
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
by: Liu, Huabin, et al.
Published: (2025)
by: Liu, Huabin, et al.
Published: (2025)
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
by: Dehghanighobadi, Zahra, et al.
Published: (2026)
DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity
by: Hao, Jitai, et al.
Published: (2026)
by: Hao, Jitai, et al.
Published: (2026)
Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity
by: Poudel, Pratik
Published: (2025)
by: Poudel, Pratik
Published: (2025)
OjaKV: Context-Aware Online Low-Rank KV Cache Compression
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
by: Feng, Yuan, et al.
Published: (2024)
by: Feng, Yuan, et al.
Published: (2024)
Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection
by: Salehi, Mohammadreza, et al.
Published: (2024)
by: Salehi, Mohammadreza, et al.
Published: (2024)
KV Cache Offloading for Context-Intensive Tasks
by: Bocharnikov, Andrey, et al.
Published: (2026)
by: Bocharnikov, Andrey, et al.
Published: (2026)
Similar Items
-
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
by: Laitenberger, Filipe, et al.
Published: (2025) -
Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
by: Kopiczko, Dawid J., et al.
Published: (2024) -
VeRA: Vector-based Random Matrix Adaptation
by: Kopiczko, Dawid J., et al.
Published: (2023) -
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
by: Dorkenwald, Michael, et al.
Published: (2024) -
Lost in Time: A New Temporal Benchmark for VideoLLMs
by: Cores, Daniel, et al.
Published: (2024)