Saved in:
| Main Authors: | Wang, Qian, Yousefijamarani, Zahra, Heisler, Morgan Lindsay, Gu, Rongzhi, Xiaolong, Bai, Yizhou, Shan, Zhang, Wei, Lan, Wang, Xiong, Ying, Zhang, Yong, Fan, Zhenan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.16822 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling
by: Yousefijamarani, Zahra, et al.
Published: (2025)
by: Yousefijamarani, Zahra, et al.
Published: (2025)
EPIC: Efficient Position-Independent Caching for Serving Large Language Models
by: Hu, Junhao, et al.
Published: (2024)
by: Hu, Junhao, et al.
Published: (2024)
DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing
by: Jang, Daesik, et al.
Published: (2026)
by: Jang, Daesik, et al.
Published: (2026)
Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs
by: Heisler, Morgan Lindsay, et al.
Published: (2025)
by: Heisler, Morgan Lindsay, et al.
Published: (2025)
DL-PIM: Improving Data Locality in Processing-in-Memory Systems
by: Tian, Parker Hao, et al.
Published: (2025)
by: Tian, Parker Hao, et al.
Published: (2025)
ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
by: Shi, Ge, et al.
Published: (2025)
by: Shi, Ge, et al.
Published: (2025)
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
by: Hu, Cunchen, et al.
Published: (2024)
by: Hu, Cunchen, et al.
Published: (2024)
Efficiently Serving Large Multimodal Models Using EPD Disaggregation
by: Singh, Gursimran, et al.
Published: (2024)
by: Singh, Gursimran, et al.
Published: (2024)
Do LLMs Align with My Task? Evaluating Text-to-SQL via Dataset Alignment
by: Rafiei, Davood, et al.
Published: (2025)
by: Rafiei, Davood, et al.
Published: (2025)
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding
by: Heisler, Morgan, et al.
Published: (2022)
by: Heisler, Morgan, et al.
Published: (2022)
Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
by: Ma, Bole, et al.
Published: (2026)
by: Ma, Bole, et al.
Published: (2026)
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management
by: Zhang, Hang, et al.
Published: (2025)
by: Zhang, Hang, et al.
Published: (2025)
MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving
by: Zhao, Shiju, et al.
Published: (2025)
by: Zhao, Shiju, et al.
Published: (2025)
morgan-heisler/DeckBench: v1.0.0 — DECKBench Initial Release (KDD 2026)
by: dsjang2, et al.
Published: (2026)
by: dsjang2, et al.
Published: (2026)
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
by: Zhang, Rongzhi, et al.
Published: (2024)
by: Zhang, Rongzhi, et al.
Published: (2024)
MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025)
by: Su, Zhaoyuan, et al.
Published: (2025)
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
by: Yao, Jiayi, et al.
Published: (2024)
by: Yao, Jiayi, et al.
Published: (2024)
SparseX: Efficient Segment-Level KV Cache Sharing for Interleaved LLM Serving
by: Zhang, Quqing, et al.
Published: (2026)
by: Zhang, Quqing, et al.
Published: (2026)
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)
by: Xia, Tianhua, et al.
Published: (2025)
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
by: Li, Haley, et al.
Published: (2026)
by: Li, Haley, et al.
Published: (2026)
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)
by: Xiong, Yi, et al.
Published: (2024)
BLITZSCALE: Fast and Live Large Model Autoscaling with O(1) Host Caching
by: Zhang, Dingyan, et al.
Published: (2024)
by: Zhang, Dingyan, et al.
Published: (2024)
You Need an Encoder for Native Position-Independent Caching
by: Zhao, Shiju, et al.
Published: (2026)
by: Zhao, Shiju, et al.
Published: (2026)
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
by: Liu, Yuhan, et al.
Published: (2023)
by: Liu, Yuhan, et al.
Published: (2023)
Global Rotation of Skyrmion Bags under Vertical Microwave Fields
by: Bo, Lan, et al.
Published: (2024)
by: Bo, Lan, et al.
Published: (2024)
BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions
by: Wang, Quancheng, et al.
Published: (2023)
by: Wang, Quancheng, et al.
Published: (2023)
CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
by: Li, Suyi, et al.
Published: (2024)
by: Li, Suyi, et al.
Published: (2024)
Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
by: Gao, Shihong, et al.
Published: (2025)
by: Gao, Shihong, et al.
Published: (2025)
A Multi‐Dimensional Feature Fusion Framework With XGBoost for IIoT ‐Driven Behavioral Analytics in Industrial Internet Systems
by: Jiaqi Wang, et al.
Published: (2025)
by: Jiaqi Wang, et al.
Published: (2025)
EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
by: Feng, Shaoting, et al.
Published: (2025)
by: Feng, Shaoting, et al.
Published: (2025)
Uso de plantas medicinales en el cuidado de la salud: la producción científica de tesis y disertaciones de enfermería brasileña
by: Elisa Vanessa Heisler
Published: (2015)
by: Elisa Vanessa Heisler
Published: (2015)
“Populism and Democracy.” A review of The Age of Discontent: Populism, Extremis and Conspiracy Theories in Contemporary Democracies By MathewRhodes‐Purdy, RachelNavaree and StephenUtych, Cambridge, UK: Cambridge University Press. 2024. $34.99 (pbk); $110.00 (hbk); $110.00 (ebk)
by: Barbara Schmitter Heisler
Published: (2024)
by: Barbara Schmitter Heisler
Published: (2024)
InstCache: A Predictive Cache for LLM Serving
by: Zou, Longwei, et al.
Published: (2024)
by: Zou, Longwei, et al.
Published: (2024)
DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
by: Yuan, Ying, et al.
Published: (2026)
by: Yuan, Ying, et al.
Published: (2026)
DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving
by: Huang, Heyang, et al.
Published: (2025)
by: Huang, Heyang, et al.
Published: (2025)
Memory augment is All You Need for image restoration
by: Zhang, Xiao Feng, et al.
Published: (2023)
by: Zhang, Xiao Feng, et al.
Published: (2023)
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
by: Liu, Yuhan, et al.
Published: (2024)
by: Liu, Yuhan, et al.
Published: (2024)
LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026)
by: Liu, Sihao, et al.
Published: (2026)
Continuous Semantic Caching for Low-Cost LLM Serving
by: Atalar, Baran, et al.
Published: (2026)
by: Atalar, Baran, et al.
Published: (2026)
Similar Items
-
HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling
by: Yousefijamarani, Zahra, et al.
Published: (2025) -
EPIC: Efficient Position-Independent Caching for Serving Large Language Models
by: Hu, Junhao, et al.
Published: (2024) -
DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing
by: Jang, Daesik, et al.
Published: (2026) -
Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs
by: Heisler, Morgan Lindsay, et al.
Published: (2025) -
DL-PIM: Improving Data Locality in Processing-in-Memory Systems
by: Tian, Parker Hao, et al.
Published: (2025)