Saved in:
| Main Authors: | Liu, Banruo, Lin, Wei-Yu, Fang, Minghao, Jiang, Yihan, Lai, Fan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.16397 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
JITServe: SLO-aware LLM Serving with Imprecise Request Information
by: Zhang, Wei, et al.
Published: (2025)
by: Zhang, Wei, et al.
Published: (2025)
Budget-aware Query Tuning: An AutoML Perspective
by: Wu, Wentao, et al.
Published: (2024)
by: Wu, Wentao, et al.
Published: (2024)
MaskSearch: Querying Image Masks at Scale
by: He, Dong, et al.
Published: (2023)
by: He, Dong, et al.
Published: (2023)
HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving
by: Hu, Zhengding, et al.
Published: (2025)
by: Hu, Zhengding, et al.
Published: (2025)
PolyServe: Efficient Multi-SLO Serving at Scale
by: Zhu, Kan, et al.
Published: (2025)
by: Zhu, Kan, et al.
Published: (2025)
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
by: Zeng, Pai, et al.
Published: (2024)
by: Zeng, Pai, et al.
Published: (2024)
AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding
by: Li, Zikun, et al.
Published: (2025)
by: Li, Zikun, et al.
Published: (2025)
ACE: A Cardinality Estimator for Set-Valued Queries
by: Sheng, Yufan, et al.
Published: (2025)
by: Sheng, Yufan, et al.
Published: (2025)
Hydro: Adaptive Query Processing of ML Queries
by: Kakkar, Gaurav Tarlok, et al.
Published: (2024)
by: Kakkar, Gaurav Tarlok, et al.
Published: (2024)
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
by: Wang, Zhengchao, et al.
Published: (2025)
by: Wang, Zhengchao, et al.
Published: (2025)
Efficient Vector Search in the Wild: One Model for Multi-K Queries
by: Peng, Yifan, et al.
Published: (2026)
by: Peng, Yifan, et al.
Published: (2026)
SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving
by: Kakolyris, Andreas Kosmas, et al.
Published: (2024)
by: Kakolyris, Andreas Kosmas, et al.
Published: (2024)
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
by: Ouyang, Biao, et al.
Published: (2025)
by: Ouyang, Biao, et al.
Published: (2025)
Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics
by: Kim, Jungwoo, et al.
Published: (2025)
by: Kim, Jungwoo, et al.
Published: (2025)
ResidualPlanner+: a scalable matrix mechanism for marginals and beyond
by: Xiao, Yingtai, et al.
Published: (2023)
by: Xiao, Yingtai, et al.
Published: (2023)
Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows
by: Wei, Lindsey Linxi, et al.
Published: (2024)
by: Wei, Lindsey Linxi, et al.
Published: (2024)
Private Queries with Sigma-Counting
by: Gao, Jun, et al.
Published: (2025)
by: Gao, Jun, et al.
Published: (2025)
Optimizing LLM Queries in Relational Data Analytics Workloads
by: Liu, Shu, et al.
Published: (2024)
by: Liu, Shu, et al.
Published: (2024)
The Unreasonable Effectiveness of LLMs for Query Optimization
by: Akioyamen, Peter, et al.
Published: (2024)
by: Akioyamen, Peter, et al.
Published: (2024)
TranSQL+: Serving Large Language Models with SQL on Low-Resource Hardware
by: Sun, Wenbo, et al.
Published: (2025)
by: Sun, Wenbo, et al.
Published: (2025)
Low Rank Learning for Offline Query Optimization
by: Yi, Zixuan, et al.
Published: (2025)
by: Yi, Zixuan, et al.
Published: (2025)
Predictive Query-based Pipeline for Graph Data
by: Neto, Plácido A Souza
Published: (2024)
by: Neto, Plácido A Souza
Published: (2024)
Incorporating Deep Learning Design in Database Queries
by: Lubarsky, Yuval Lev, et al.
Published: (2026)
by: Lubarsky, Yuval Lev, et al.
Published: (2026)
Adversarial Query Synthesis via Bayesian Optimization
by: Tao, Jeffrey, et al.
Published: (2026)
by: Tao, Jeffrey, et al.
Published: (2026)
Sibyl: Forecasting Time-Evolving Query Workloads
by: Huang, Hanxian, et al.
Published: (2024)
by: Huang, Hanxian, et al.
Published: (2024)
EdgeServe: A Streaming System for Decentralized Model Serving
by: Shaowang, Ted, et al.
Published: (2023)
by: Shaowang, Ted, et al.
Published: (2023)
Improving DBMS Scheduling Decisions with Fine-grained Performance Prediction on Concurrent Queries -- Extended
by: Wu, Ziniu, et al.
Published: (2025)
by: Wu, Ziniu, et al.
Published: (2025)
A Declarative Query Language for Scientific Machine Learning
by: Jamil, Hasan M
Published: (2024)
by: Jamil, Hasan M
Published: (2024)
SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses
by: Wu, Yifan, et al.
Published: (2026)
by: Wu, Yifan, et al.
Published: (2026)
MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases
by: Lin, Zhisheng, et al.
Published: (2024)
by: Lin, Zhisheng, et al.
Published: (2024)
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
by: Kim, Jang-Hyun, et al.
Published: (2025)
by: Kim, Jang-Hyun, et al.
Published: (2025)
SemBench: A Benchmark for Semantic Query Processing Engines
by: Lao, Jiale, et al.
Published: (2025)
by: Lao, Jiale, et al.
Published: (2025)
Training-Free Query Optimization via LLM-Based Plan Similarity
by: Vasilenko, Nikita, et al.
Published: (2025)
by: Vasilenko, Nikita, et al.
Published: (2025)
RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems
by: Lee, Seokwon, et al.
Published: (2026)
by: Lee, Seokwon, et al.
Published: (2026)
LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates
by: Quader, Shaikh, et al.
Published: (2024)
by: Quader, Shaikh, et al.
Published: (2024)
MACE: A Hybrid LLM Serving System with Colocated SLO-aware Continuous Retraining Alignment
by: Li, Yufei, et al.
Published: (2025)
by: Li, Yufei, et al.
Published: (2025)
Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation
by: Brunink, Yannick, et al.
Published: (2025)
by: Brunink, Yannick, et al.
Published: (2025)
Tradeoffs in Processing Queries and Supporting Updates over an ML-Enhanced R-tree
by: Al-Mamun, Abdullah, et al.
Published: (2025)
by: Al-Mamun, Abdullah, et al.
Published: (2025)
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
by: Tang, Zirui, et al.
Published: (2026)
by: Tang, Zirui, et al.
Published: (2026)
Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation
by: Sun, Yushi, et al.
Published: (2024)
by: Sun, Yushi, et al.
Published: (2024)
Similar Items
-
JITServe: SLO-aware LLM Serving with Imprecise Request Information
by: Zhang, Wei, et al.
Published: (2025) -
Budget-aware Query Tuning: An AutoML Perspective
by: Wu, Wentao, et al.
Published: (2024) -
MaskSearch: Querying Image Masks at Scale
by: He, Dong, et al.
Published: (2023) -
HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving
by: Hu, Zhengding, et al.
Published: (2025) -
PolyServe: Efficient Multi-SLO Serving at Scale
by: Zhu, Kan, et al.
Published: (2025)