Saved in:
| Main Authors: | Agrawal, Rishabh, Kumar, Himanshu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.08261 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression
by: Sood, Aryan, et al.
Published: (2026)
by: Sood, Aryan, et al.
Published: (2026)
CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation
by: Lee, Kun-Hui, et al.
Published: (2025)
by: Lee, Kun-Hui, et al.
Published: (2025)
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
by: Wang, Jiahao, et al.
Published: (2026)
by: Wang, Jiahao, et al.
Published: (2026)
Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024)
by: Yang, Zhen, et al.
Published: (2024)
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
by: Şenol, Ali, et al.
Published: (2025)
by: Şenol, Ali, et al.
Published: (2025)
CAG: Chunked Augmented Generation for Google Chrome's Built-in Gemini Nano
by: Surulimuthu, Vivek Vellaiyappan, et al.
Published: (2024)
by: Surulimuthu, Vivek Vellaiyappan, et al.
Published: (2024)
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
by: Baek, Jinheon, et al.
Published: (2023)
by: Baek, Jinheon, et al.
Published: (2023)
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
by: Fu, Yu, et al.
Published: (2024)
by: Fu, Yu, et al.
Published: (2024)
Enhancing RAG Efficiency with Adaptive Context Compression
by: Guo, Shuyu, et al.
Published: (2025)
by: Guo, Shuyu, et al.
Published: (2025)
Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation
by: Agrawal, Atharva Mangeshkumar, et al.
Published: (2025)
by: Agrawal, Atharva Mangeshkumar, et al.
Published: (2025)
One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
by: Lu, Liming, et al.
Published: (2026)
by: Lu, Liming, et al.
Published: (2026)
"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models
by: Tao, Yufei, et al.
Published: (2025)
by: Tao, Yufei, et al.
Published: (2025)
Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs
by: Dong, Su, et al.
Published: (2026)
by: Dong, Su, et al.
Published: (2026)
EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation
by: Hwang, Taeho, et al.
Published: (2024)
by: Hwang, Taeho, et al.
Published: (2024)
KVSculpt: KV Cache Compression as Distillation
by: Jiang, Bo, et al.
Published: (2026)
by: Jiang, Bo, et al.
Published: (2026)
CommVQ: Commutative Vector Quantization for KV Cache Compression
by: Li, Junyan, et al.
Published: (2025)
by: Li, Junyan, et al.
Published: (2025)
HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
by: Shi, Zhiyuan, et al.
Published: (2026)
by: Shi, Zhiyuan, et al.
Published: (2026)
Retrieval-Augmented Generation with Hierarchical Knowledge
by: Huang, Haoyu, et al.
Published: (2025)
by: Huang, Haoyu, et al.
Published: (2025)
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)
by: Liu, Akide, et al.
Published: (2024)
Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors
by: Thakur, Himanshu, et al.
Published: (2025)
by: Thakur, Himanshu, et al.
Published: (2025)
Detecting and Mitigating Bias in LLMs through Knowledge Graph-Augmented Training
by: Kumar, Rajeev, et al.
Published: (2025)
by: Kumar, Rajeev, et al.
Published: (2025)
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning
by: Corallo, Giulio, et al.
Published: (2025)
by: Corallo, Giulio, et al.
Published: (2025)
Enhancing Presentation Slide Generation by LLMs with a Multi-Staged End-to-End Approach
by: Bandyopadhyay, Sambaran, et al.
Published: (2024)
by: Bandyopadhyay, Sambaran, et al.
Published: (2024)
KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)
by: Yuan, Aomufei, et al.
Published: (2025)
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression
by: Godey, Nathan, et al.
Published: (2025)
by: Godey, Nathan, et al.
Published: (2025)
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
by: Cai, Zefan, et al.
Published: (2025)
by: Cai, Zefan, et al.
Published: (2025)
EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)
by: Zhou, Yuhao, et al.
Published: (2025)
ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
by: Zhang, Yize, et al.
Published: (2025)
by: Zhang, Yize, et al.
Published: (2025)
Knowledge Graph-Guided Retrieval Augmented Generation
by: Zhu, Xiangrong, et al.
Published: (2025)
by: Zhu, Xiangrong, et al.
Published: (2025)
Contextual Reinforcement in Multimodal Token Compression for Large Language Models
by: Piero, Naderdel, et al.
Published: (2025)
by: Piero, Naderdel, et al.
Published: (2025)
SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
by: Jin, Yiqiao, et al.
Published: (2025)
by: Jin, Yiqiao, et al.
Published: (2025)
SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning
by: Asgarov, Ali, et al.
Published: (2025)
by: Asgarov, Ali, et al.
Published: (2025)
Hold Onto That Thought: Assessing KV Cache Compression On Reasoning
by: Liu, Minghui, et al.
Published: (2025)
by: Liu, Minghui, et al.
Published: (2025)
RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding
by: Zhang, Zihong, et al.
Published: (2026)
by: Zhang, Zihong, et al.
Published: (2026)
AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models
by: Zhang, Qianchi, et al.
Published: (2024)
by: Zhang, Qianchi, et al.
Published: (2024)
Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications
by: Lecu, Alexandru, et al.
Published: (2025)
by: Lecu, Alexandru, et al.
Published: (2025)
KERAG: Knowledge-Enhanced Retrieval-Augmented Generation for Advanced Question Answering
by: Sun, Yushi, et al.
Published: (2025)
by: Sun, Yushi, et al.
Published: (2025)
Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores
by: Chari, Vivek, et al.
Published: (2025)
by: Chari, Vivek, et al.
Published: (2025)
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
by: Cai, Zefan, et al.
Published: (2024)
by: Cai, Zefan, et al.
Published: (2024)
Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation
by: Agarwal, Shubham, et al.
Published: (2025)
by: Agarwal, Shubham, et al.
Published: (2025)
Similar Items
-
More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression
by: Sood, Aryan, et al.
Published: (2026) -
CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation
by: Lee, Kun-Hui, et al.
Published: (2025) -
From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation
by: Wang, Jiahao, et al.
Published: (2026) -
Lossless KV Cache Compression to 2%
by: Yang, Zhen, et al.
Published: (2024) -
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
by: Şenol, Ali, et al.
Published: (2025)