Saved in:
| Main Authors: | Biton, Dvir David, Friedman, Roy |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03301 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
by: Liu, Zhiyuan, et al.
Published: (2025)
by: Liu, Zhiyuan, et al.
Published: (2025)
CaRT: Teaching LLM Agents to Know When They Know Enough
by: Liu, Grace, et al.
Published: (2025)
by: Liu, Grace, et al.
Published: (2025)
Screening Is Enough
by: Nakanishi, Ken M.
Published: (2026)
by: Nakanishi, Ken M.
Published: (2026)
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)
by: Saxena, Utkarsh, et al.
Published: (2024)
KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)
by: Staniszewski, Konrad, et al.
Published: (2025)
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
by: Gu, Zhuohan, et al.
Published: (2026)
by: Gu, Zhuohan, et al.
Published: (2026)
TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
by: Dzikanyanga, Gradwell, et al.
Published: (2026)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
Enough Coin Flips Can Make LLMs Act Bayesian
by: Gupta, Ritwik, et al.
Published: (2025)
by: Gupta, Ritwik, et al.
Published: (2025)
Understanding LLM Embeddings for Regression
by: Tang, Eric, et al.
Published: (2024)
by: Tang, Eric, et al.
Published: (2024)
Static Word Embeddings for Sentence Semantic Representation
by: Wada, Takashi, et al.
Published: (2025)
by: Wada, Takashi, et al.
Published: (2025)
MeanCache: User-Centric Semantic Caching for LLM Web Services
by: Gill, Waris, et al.
Published: (2024)
by: Gill, Waris, et al.
Published: (2024)
User-LLM: Efficient LLM Contextualization with User Embeddings
by: Ning, Lin, et al.
Published: (2024)
by: Ning, Lin, et al.
Published: (2024)
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
by: Song, Kefan, et al.
Published: (2025)
by: Song, Kefan, et al.
Published: (2025)
KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)
by: Tian, Yuxuan, et al.
Published: (2025)
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
by: Kang, Hao, et al.
Published: (2024)
by: Kang, Hao, et al.
Published: (2024)
On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
by: Geng, Mingmeng, et al.
Published: (2025)
by: Geng, Mingmeng, et al.
Published: (2025)
BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment
by: Xu, Wenda, et al.
Published: (2024)
by: Xu, Wenda, et al.
Published: (2024)
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
by: Tang, Pingzhi, et al.
Published: (2026)
by: Tang, Pingzhi, et al.
Published: (2026)
AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents
by: Kim, Hojoon, et al.
Published: (2026)
by: Kim, Hojoon, et al.
Published: (2026)
Generalist Foundation Models Are Not Clinical Enough for Hospital Operations
by: Jiang, Lavender Y., et al.
Published: (2025)
by: Jiang, Lavender Y., et al.
Published: (2025)
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
by: He, Zhenyu, et al.
Published: (2024)
by: He, Zhenyu, et al.
Published: (2024)
BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics
by: Gillin, Taj, et al.
Published: (2026)
by: Gillin, Taj, et al.
Published: (2026)
A General Framework for Producing Interpretable Semantic Text Embeddings
by: Sun, Yiqun, et al.
Published: (2024)
by: Sun, Yiqun, et al.
Published: (2024)
Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding
by: Liu, Yichen, et al.
Published: (2022)
by: Liu, Yichen, et al.
Published: (2022)
Output Embedding Centering for Stable LLM Pretraining
by: Stollenwerk, Felix, et al.
Published: (2026)
by: Stollenwerk, Felix, et al.
Published: (2026)
Aligned at the Start: Conceptual Groupings in LLM Embeddings
by: Khatir, Mehrdad, et al.
Published: (2024)
by: Khatir, Mehrdad, et al.
Published: (2024)
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)
by: Liu, Akide, et al.
Published: (2024)
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
by: Dugan, Owen, et al.
Published: (2024)
by: Dugan, Owen, et al.
Published: (2024)
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
by: Yang, Junxiao, et al.
Published: (2026)
by: Yang, Junxiao, et al.
Published: (2026)
When Less is Enough: Efficient Inference via Collaborative Reasoning
by: Chen, Yilei, et al.
Published: (2026)
by: Chen, Yilei, et al.
Published: (2026)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings
by: Grewal, Yashvir S., et al.
Published: (2024)
by: Grewal, Yashvir S., et al.
Published: (2024)
Representing Rule-based Chatbots with Transformers
by: Friedman, Dan, et al.
Published: (2024)
by: Friedman, Dan, et al.
Published: (2024)
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
by: Liu, Yuhan, et al.
Published: (2024)
by: Liu, Yuhan, et al.
Published: (2024)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
by: Song, Weixi, et al.
Published: (2023)
by: Song, Weixi, et al.
Published: (2023)
Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek
by: Pavlopoulos, John, et al.
Published: (2025)
by: Pavlopoulos, John, et al.
Published: (2025)
The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems
by: Sinha, Debu
Published: (2025)
by: Sinha, Debu
Published: (2025)
LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
by: Shi, Dachuan, et al.
Published: (2025)
by: Shi, Dachuan, et al.
Published: (2025)
Similar Items
-
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025) -
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
by: Liu, Zhiyuan, et al.
Published: (2025) -
CaRT: Teaching LLM Agents to Know When They Know Enough
by: Liu, Grace, et al.
Published: (2025) -
Screening Is Enough
by: Nakanishi, Ken M.
Published: (2026) -
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)