:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Agrawal, Rishabh, Kumar, Himanshu
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.08261
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression
by: Sood, Aryan, et al.
Published: (2026)

CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation
by: Lee, Kun-Hui, et al.
Published: (2025)

From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
by: Wang, Jiahao, et al.
Published: (2026)

Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)

Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
by: Şenol, Ali, et al.
Published: (2025)

CAG: Chunked Augmented Generation for Google Chrome's Built-in Gemini Nano
by: Surulimuthu, Vivek Vellaiyappan, et al.
Published: (2024)

Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
by: Baek, Jinheon, et al.
Published: (2023)

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
by: Fu, Yu, et al.
Published: (2024)

Enhancing RAG Efficiency with Adaptive Context Compression
by: Guo, Shuyu, et al.
Published: (2025)

Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation
by: Agrawal, Atharva Mangeshkumar, et al.
Published: (2025)

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)

"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models
by: Tao, Yufei, et al.
Published: (2025)

Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs
by: Dong, Su, et al.
Published: (2026)

EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation
by: Hwang, Taeho, et al.
Published: (2024)

KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)

CommVQ: Commutative Vector Quantization for KV Cache Compression
by: Li, Junyan, et al.
Published: (2025)

HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
by: Shi, Zhiyuan, et al.
Published: (2026)

Retrieval-Augmented Generation with Hierarchical Knowledge
by: Huang, Haoyu, et al.
Published: (2025)

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)

Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors
by: Thakur, Himanshu, et al.
Published: (2025)

Detecting and Mitigating Bias in LLMs through Knowledge Graph-Augmented Training
by: Kumar, Rajeev, et al.
Published: (2025)

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025)

Enhancing Presentation Slide Generation by LLMs with a Multi-Staged End-to-End Approach
by: Bandyopadhyay, Sambaran, et al.
Published: (2024)

KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression
by: Godey, Nathan, et al.
Published: (2025)

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)

EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)

ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
by: Zhang, Yize, et al.
Published: (2025)

Knowledge Graph-Guided Retrieval Augmented Generation
by: Zhu, Xiangrong, et al.
Published: (2025)

Contextual Reinforcement in Multimodal Token Compression for Large Language Models
by: Piero, Naderdel, et al.
Published: (2025)

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
by: Jin, Yiqiao, et al.
Published: (2025)

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning
by: Asgarov, Ali, et al.
Published: (2025)

Hold Onto That Thought: Assessing KV Cache Compression On Reasoning
by: Liu, Minghui, et al.
Published: (2025)

RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding
by: Zhang, Zihong, et al.
Published: (2026)

AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models
by: Zhang, Qianchi, et al.
Published: (2024)

Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications
by: Lecu, Alexandru, et al.
Published: (2025)

KERAG: Knowledge-Enhanced Retrieval-Augmented Generation for Advanced Question Answering
by: Sun, Yushi, et al.
Published: (2025)

Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores
by: Chari, Vivek, et al.
Published: (2025)

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation
by: Agarwal, Shubham, et al.
Published: (2025)