Saved in:
| Main Authors: | Li, Dacheng, Cao, Shiyi, Griggs, Tyler, Liu, Shu, Mo, Xiangxi, Tang, Eric, Hegde, Sumanth, Hakhamaneshi, Kourosh, Patil, Shishir G., Zaharia, Matei, Gonzalez, Joseph E., Stoica, Ion |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.07374 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
by: Cao, Shiyi, et al.
Published: (2025)
by: Cao, Shiyi, et al.
Published: (2025)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)
by: Cao, Shiyi, et al.
Published: (2024)
Reasoning Models Can Be Effective Without Thinking
by: Ma, Wenjie, et al.
Published: (2025)
by: Ma, Wenjie, et al.
Published: (2025)
RAFT: Adapting Language Model to Domain Specific RAG
by: Zhang, Tianjun, et al.
Published: (2024)
by: Zhang, Tianjun, et al.
Published: (2024)
Optimizing LLM Queries in Relational Data Analytics Workloads
by: Liu, Shu, et al.
Published: (2024)
by: Liu, Shu, et al.
Published: (2024)
Pie: Pooling CPU Memory for LLM Inference
by: Xu, Yi, et al.
Published: (2024)
by: Xu, Yi, et al.
Published: (2024)
The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More
by: Chen, Lingjiao, et al.
Published: (2026)
by: Chen, Lingjiao, et al.
Published: (2026)
Delta Fair Sharing: Performance Isolation for Multi-Tenant Storage Systems
by: Griggs, Tyler, et al.
Published: (2026)
by: Griggs, Tyler, et al.
Published: (2026)
HashAttention: Semantic Sparsity for Faster Inference
by: Desai, Aditya, et al.
Published: (2024)
by: Desai, Aditya, et al.
Published: (2024)
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design
by: Davis, Jared Quincy, et al.
Published: (2024)
by: Davis, Jared Quincy, et al.
Published: (2024)
Specifications: The missing link to making the development of LLM systems an engineering discipline
by: Stoica, Ion, et al.
Published: (2024)
by: Stoica, Ion, et al.
Published: (2024)
RAG over Thinking Traces Can Improve Reasoning Tasks
by: Arabzadeh, Negar, et al.
Published: (2026)
by: Arabzadeh, Negar, et al.
Published: (2026)
DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
by: Patel, Liana, et al.
Published: (2025)
by: Patel, Liana, et al.
Published: (2025)
MemGPT: Towards LLMs as Operating Systems
by: Packer, Charles, et al.
Published: (2023)
by: Packer, Charles, et al.
Published: (2023)
Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
by: Chen, Lingjiao, et al.
Published: (2024)
by: Chen, Lingjiao, et al.
Published: (2024)
Optimizing Model Selection for Compound AI Systems
by: Chen, Lingjiao, et al.
Published: (2025)
by: Chen, Lingjiao, et al.
Published: (2025)
AI-Driven Research for Databases
by: Cheng, Audrey, et al.
Published: (2026)
by: Cheng, Audrey, et al.
Published: (2026)
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation
by: Zhu, Alan, et al.
Published: (2025)
by: Zhu, Alan, et al.
Published: (2025)
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
by: Griggs, Tyler, et al.
Published: (2024)
by: Griggs, Tyler, et al.
Published: (2024)
Accelerating Direct Preference Optimization with Prefix Sharing
by: Wang, Franklin, et al.
Published: (2024)
by: Wang, Franklin, et al.
Published: (2024)
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
by: Arabzadeh, Negar, et al.
Published: (2026)
by: Arabzadeh, Negar, et al.
Published: (2026)
vAttention: Verified Sparse Attention
by: Desai, Aditya, et al.
Published: (2025)
by: Desai, Aditya, et al.
Published: (2025)
Some Present-Day Problems of Romanian Library Science
by: Stoica, Ion
Published: (1973)
by: Stoica, Ion
Published: (1973)
The Central University Library, Bucharest. Over Seventy-five Years in the History of a Collection
by: Stoica, Ion
Published: (1972)
by: Stoica, Ion
Published: (1972)
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
by: Cao, Shiyi, et al.
Published: (2026)
by: Cao, Shiyi, et al.
Published: (2026)
Semi-Supervised One-Shot Imitation Learning
by: Wu, Philipp, et al.
Published: (2024)
by: Wu, Philipp, et al.
Published: (2024)
The Time is Here for Just-in-Time Systems: Challenges and Opportunities
by: Liu, Shu, et al.
Published: (2026)
by: Liu, Shu, et al.
Published: (2026)
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
by: Jiang, Xuanlin, et al.
Published: (2024)
by: Jiang, Xuanlin, et al.
Published: (2024)
Fairness in Serving Large Language Models
by: Sheng, Ying, et al.
Published: (2023)
by: Sheng, Ying, et al.
Published: (2023)
MPC-Minimized Secure LLM Inference
by: Rathee, Deevashwer, et al.
Published: (2024)
by: Rathee, Deevashwer, et al.
Published: (2024)
AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
by: Cemri, Mert, et al.
Published: (2026)
by: Cemri, Mert, et al.
Published: (2026)
LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
by: Kolasani, Sai, et al.
Published: (2025)
by: Kolasani, Sai, et al.
Published: (2025)
SIEVE: Sample-Efficient Parametric Learning from Natural Language
by: Asawa, Parth, et al.
Published: (2026)
by: Asawa, Parth, et al.
Published: (2026)
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
by: Mao, Ziming, et al.
Published: (2024)
by: Mao, Ziming, et al.
Published: (2024)
S*: Test Time Scaling for Code Generation
by: Li, Dacheng, et al.
Published: (2025)
by: Li, Dacheng, et al.
Published: (2025)
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
by: Opsahl-Ong, Krista, et al.
Published: (2024)
by: Opsahl-Ong, Krista, et al.
Published: (2024)
Resilience Quantification and its Support for Operational Resilience
by: Matei, Ion, et al.
Published: (2026)
by: Matei, Ion, et al.
Published: (2026)
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
by: Agarwal, Shubham, et al.
Published: (2026)
by: Agarwal, Shubham, et al.
Published: (2026)
LEANN: A Low-Storage Vector Index
by: Wang, Yichuan, et al.
Published: (2025)
by: Wang, Yichuan, et al.
Published: (2025)
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data
by: Patel, Liana, et al.
Published: (2024)
by: Patel, Liana, et al.
Published: (2024)
Similar Items
-
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
by: Cao, Shiyi, et al.
Published: (2025) -
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024) -
Reasoning Models Can Be Effective Without Thinking
by: Ma, Wenjie, et al.
Published: (2025) -
RAFT: Adapting Language Model to Domain Specific RAG
by: Zhang, Tianjun, et al.
Published: (2024) -
Optimizing LLM Queries in Relational Data Analytics Workloads
by: Liu, Shu, et al.
Published: (2024)