:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Qian, Yousefijamarani, Zahra, Heisler, Morgan Lindsay, Gu, Rongzhi, Xiaolong, Bai, Yizhou, Shan, Zhang, Wei, Lan, Wang, Xiong, Ying, Zhang, Yong, Fan, Zhenan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2512.16822
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling
by: Yousefijamarani, Zahra, et al.
Published: (2025)

EPIC: Efficient Position-Independent Caching for Serving Large Language Models
by: Hu, Junhao, et al.
Published: (2024)

DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing
by: Jang, Daesik, et al.
Published: (2026)

Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs
by: Heisler, Morgan Lindsay, et al.
Published: (2025)

DL-PIM: Improving Data Locality in Processing-in-Memory Systems
by: Tian, Parker Hao, et al.
Published: (2025)

ExpertWeave: Efficiently Serving Expert-Specialized Fine-Tuned Adapters at Scale
by: Shi, Ge, et al.
Published: (2025)

MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
by: Hu, Cunchen, et al.
Published: (2024)

Efficiently Serving Large Multimodal Models Using EPD Disaggregation
by: Singh, Gursimran, et al.
Published: (2024)

Do LLMs Align with My Task? Evaluating Text-to-SQL via Dataset Alignment
by: Rafiei, Davood, et al.
Published: (2025)

SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding
by: Heisler, Morgan, et al.
Published: (2022)

Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
by: Ma, Bole, et al.
Published: (2026)

Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management
by: Zhang, Hang, et al.
Published: (2025)

MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving
by: Zhao, Shiju, et al.
Published: (2025)

morgan-heisler/DeckBench: v1.0.0 — DECKBench Initial Release (KDD 2026)
by: dsjang2, et al.
Published: (2026)

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
by: Zhang, Rongzhi, et al.
Published: (2024)

MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025)

CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
by: Yao, Jiayi, et al.
Published: (2024)

SparseX: Efficient Segment-Level KV Cache Sharing for Interleaved LLM Serving
by: Zhang, Quqing, et al.
Published: (2026)

Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)

ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
by: Li, Haley, et al.
Published: (2026)

LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)

BLITZSCALE: Fast and Live Large Model Autoscaling with O(1) Host Caching
by: Zhang, Dingyan, et al.
Published: (2024)

You Need an Encoder for Native Position-Independent Caching
by: Zhao, Shiju, et al.
Published: (2026)

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
by: Liu, Yuhan, et al.
Published: (2023)

Global Rotation of Skyrmion Bags under Vertical Microwave Fields
by: Bo, Lan, et al.
Published: (2024)

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions
by: Wang, Quancheng, et al.
Published: (2023)

CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
by: Li, Suyi, et al.
Published: (2024)

Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving
by: Gao, Shihong, et al.
Published: (2025)

A Multi‐Dimensional Feature Fusion Framework With XGBoost for IIoT ‐Driven Behavioral Analytics in Industrial Internet Systems
by: Jiaqi Wang, et al.
Published: (2025)

EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
by: Feng, Shaoting, et al.
Published: (2025)

Uso de plantas medicinales en el cuidado de la salud: la producción científica de tesis y disertaciones de enfermería brasileña
by: Elisa Vanessa Heisler
Published: (2015)

“Populism and Democracy.” A review of The Age of Discontent: Populism, Extremis and Conspiracy Theories in Contemporary Democracies By MathewRhodes‐Purdy, RachelNavaree and StephenUtych, Cambridge, UK: Cambridge University Press. 2024. $34.99 (pbk); $110.00 (hbk); $110.00 (ebk)
by: Barbara Schmitter Heisler
Published: (2024)

InstCache: A Predictive Cache for LLM Serving
by: Zou, Longwei, et al.
Published: (2024)

DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
by: Yuan, Ying, et al.
Published: (2026)

DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving
by: Huang, Heyang, et al.
Published: (2025)

Memory augment is All You Need for image restoration
by: Zhang, Xiao Feng, et al.
Published: (2023)

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
by: Liu, Yuhan, et al.
Published: (2024)

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation
by: Xiao, Yang, et al.
Published: (2025)

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction
by: Liu, Sihao, et al.
Published: (2026)

Continuous Semantic Caching for Low-Cost LLM Serving
by: Atalar, Baran, et al.
Published: (2026)