:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Biton, Dvir David, Friedman, Roy
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2603.03301
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
by: Zhu, Yuxuan, et al.
Published: (2025)

dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
by: Liu, Zhiyuan, et al.
Published: (2025)

CaRT: Teaching LLM Agents to Know When They Know Enough
by: Liu, Grace, et al.
Published: (2025)

Screening Is Enough
by: Nakanishi, Ken M.
Published: (2026)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)

KV Cache Transform Coding for Compact Storage in LLM Inference
by: Staniszewski, Konrad, et al.
Published: (2025)

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
by: Gu, Zhuohan, et al.
Published: (2026)

TTKV: Temporal-Tiered KV Cache for Long-Context LLM Inference
by: Dzikanyanga, Gradwell, et al.
Published: (2026)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

Enough Coin Flips Can Make LLMs Act Bayesian
by: Gupta, Ritwik, et al.
Published: (2025)

Understanding LLM Embeddings for Regression
by: Tang, Eric, et al.
Published: (2024)

Static Word Embeddings for Sentence Semantic Representation
by: Wada, Takashi, et al.
Published: (2025)

MeanCache: User-Centric Semantic Caching for LLM Web Services
by: Gill, Waris, et al.
Published: (2024)

User-LLM: Efficient LLM Contextualization with User Embeddings
by: Ning, Lin, et al.
Published: (2024)

Reward Is Enough: LLMs Are In-Context Reinforcement Learners
by: Song, Kefan, et al.
Published: (2025)

KeepKV: Achieving Periodic Lossless KV Cache Compression for Efficient LLM Inference
by: Tian, Yuxuan, et al.
Published: (2025)

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
by: Kang, Hao, et al.
Published: (2024)

On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
by: Geng, Mingmeng, et al.
Published: (2025)

BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment
by: Xu, Wenda, et al.
Published: (2024)

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
by: Tang, Pingzhi, et al.
Published: (2026)

AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents
by: Kim, Hojoon, et al.
Published: (2026)

Generalist Foundation Models Are Not Clinical Enough for Hospital Operations
by: Jiang, Lavender Y., et al.
Published: (2025)

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
by: He, Zhenyu, et al.
Published: (2024)

BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics
by: Gillin, Taj, et al.
Published: (2026)

A General Framework for Producing Interpretable Semantic Text Embeddings
by: Sun, Yiqun, et al.
Published: (2024)

Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding
by: Liu, Yichen, et al.
Published: (2022)

Output Embedding Centering for Stable LLM Pretraining
by: Stollenwerk, Felix, et al.
Published: (2026)

Aligned at the Start: Conceptual Groupings in LLM Embeddings
by: Khatir, Mehrdad, et al.
Published: (2024)

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
by: Liu, Akide, et al.
Published: (2024)

OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
by: Dugan, Owen, et al.
Published: (2024)

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
by: Yang, Junxiao, et al.
Published: (2026)

When Less is Enough: Efficient Inference via Collaborative Reasoning
by: Chen, Yilei, et al.
Published: (2026)

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)

Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings
by: Grewal, Yashvir S., et al.
Published: (2024)

Representing Rule-based Chatbots with Transformers
by: Friedman, Dan, et al.
Published: (2024)

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
by: Liu, Yuhan, et al.
Published: (2024)

Sparse is Enough in Fine-tuning Pre-trained Large Language Models
by: Song, Weixi, et al.
Published: (2023)

Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek
by: Pavlopoulos, John, et al.
Published: (2025)

The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems
by: Sinha, Debu
Published: (2025)

LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
by: Shi, Dachuan, et al.
Published: (2025)