Saved in:
| Main Authors: | Wang, Peng, Liu, Yu, Liu, Ziqi, Wang, Ming-Yang, Liu, Ke, Zhou, Ke, Huang, Zhihai |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.16262 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OnePiece: A Large-Scale Distributed Inference System with RDMA for Complex AI-Generated Content (AIGC) Workflows
by: Chen, June, et al.
Published: (2026)
by: Chen, June, et al.
Published: (2026)
CoCoI: Distributed Coded Inference System for Straggler Mitigation
by: Liu, Xing, et al.
Published: (2025)
by: Liu, Xing, et al.
Published: (2025)
Content-Oblivious Leader Election in 2-Edge-Connected Networks
by: Chalopin, Jérémie, et al.
Published: (2025)
by: Chalopin, Jérémie, et al.
Published: (2025)
Big Data-Driven Fraud Detection Using Machine Learning and Real-Time Stream Processing
by: Liu, Chen, et al.
Published: (2025)
by: Liu, Chen, et al.
Published: (2025)
PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving
by: Huang, Weizhe, et al.
Published: (2025)
by: Huang, Weizhe, et al.
Published: (2025)
OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM Serving
by: Wu, Siyu, et al.
Published: (2025)
by: Wu, Siyu, et al.
Published: (2025)
LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms
by: Kong, Jie, et al.
Published: (2026)
by: Kong, Jie, et al.
Published: (2026)
Efficient Counting and Simulation in Content-Oblivious Rings
by: Chalopin, Jérémie, et al.
Published: (2026)
by: Chalopin, Jérémie, et al.
Published: (2026)
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
by: Cheng, Ke, et al.
Published: (2024)
by: Cheng, Ke, et al.
Published: (2024)
FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
by: Wu, Hao, et al.
Published: (2024)
by: Wu, Hao, et al.
Published: (2024)
HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving
by: Dong, Xianzhe, et al.
Published: (2025)
by: Dong, Xianzhe, et al.
Published: (2025)
Integrated Sensing, Communication, and Computing: An Information-oriented Resource Transaction Mechanism
by: Chen, Ning, et al.
Published: (2024)
by: Chen, Ning, et al.
Published: (2024)
A Survey on Adversarial Contention Resolution
by: Banicescu, Ioana, et al.
Published: (2024)
by: Banicescu, Ioana, et al.
Published: (2024)
Softening the Impact of Collisions in Contention Resolution
by: Biswas, Umesh, et al.
Published: (2024)
by: Biswas, Umesh, et al.
Published: (2024)
Beyond 2-Edge-Connectivity: Algorithms and Impossibility for Content-Oblivious Leader Election
by: Chang, Yi-Jun, et al.
Published: (2025)
by: Chang, Yi-Jun, et al.
Published: (2025)
Non-Uniform Content-Oblivious Leader Election on Oriented Asynchronous Rings
by: Chalopin, Jérémie, et al.
Published: (2025)
by: Chalopin, Jérémie, et al.
Published: (2025)
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
by: Jiang, Youhe, et al.
Published: (2023)
by: Jiang, Youhe, et al.
Published: (2023)
TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics
by: Tan, Difan, et al.
Published: (2026)
by: Tan, Difan, et al.
Published: (2026)
QoE-oriented Dependent Task Scheduling under Multi-dimensional QoS Constraints over Distributed Networks
by: Fan, Xuwei, et al.
Published: (2023)
by: Fan, Xuwei, et al.
Published: (2023)
Graph-Structured Deep Learning Framework for Multi-task Contention Identification with High-dimensional Metrics
by: Yang, Xiao, et al.
Published: (2026)
by: Yang, Xiao, et al.
Published: (2026)
A Contention-Free Model for Converged Kubernetes on HPC
by: Sochat, Vanessa, et al.
Published: (2024)
by: Sochat, Vanessa, et al.
Published: (2024)
Warp-STAR: High-performance, Differentiable GPU-Accelerated Static Timing Analysis through Warp-oriented Parallel Orchestration
by: Huang, En-Ming, et al.
Published: (2026)
by: Huang, En-Ming, et al.
Published: (2026)
FedHC: A Hierarchical Clustered Federated Learning Framework for Satellite Networks
by: Liu, Zhuocheng, et al.
Published: (2025)
by: Liu, Zhuocheng, et al.
Published: (2025)
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
by: Cheng, Ke, et al.
Published: (2024)
by: Cheng, Ke, et al.
Published: (2024)
HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling
by: Peng, You, et al.
Published: (2026)
by: Peng, You, et al.
Published: (2026)
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)
by: Zhao, Xuanlei, et al.
Published: (2024)
GPU-Accelerated Batch-Dynamic Subgraph Matching
by: Qiu, Linshan, et al.
Published: (2024)
by: Qiu, Linshan, et al.
Published: (2024)
Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework
by: Huang, Ruixiang, et al.
Published: (2026)
by: Huang, Ruixiang, et al.
Published: (2026)
Harpagon: Minimizing DNN Serving Cost via Efficient Dispatching, Scheduling and Splitting
by: Zhao, Zhixin, et al.
Published: (2024)
by: Zhao, Zhixin, et al.
Published: (2024)
FedCache: A Knowledge Cache-driven Federated Learning Architecture for Personalized Edge Intelligence
by: Wu, Zhiyuan, et al.
Published: (2023)
by: Wu, Zhiyuan, et al.
Published: (2023)
Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture
by: Wu, Yu, et al.
Published: (2025)
by: Wu, Yu, et al.
Published: (2025)
A Thorough Investigation of Content-Defined Chunking Algorithms for Data Deduplication
by: Gregoriadis, Marcel, et al.
Published: (2024)
by: Gregoriadis, Marcel, et al.
Published: (2024)
AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems
by: Wang, Lehao, et al.
Published: (2024)
by: Wang, Lehao, et al.
Published: (2024)
ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production
by: Xiang, Yuxing, et al.
Published: (2025)
by: Xiang, Yuxing, et al.
Published: (2025)
SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines
by: Cheng, Ke, et al.
Published: (2024)
by: Cheng, Ke, et al.
Published: (2024)
Accelerating the Delivery of Data Services over Uncertain Mobile Crowdsensing Networks
by: Liwang, Minghui, et al.
Published: (2022)
by: Liwang, Minghui, et al.
Published: (2022)
DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster
by: Zhang, Jinxiao, et al.
Published: (2026)
by: Zhang, Jinxiao, et al.
Published: (2026)
ConChain: A Scheme for Contention-free and Attack Resilient BlockChain
by: Bappy, Faisal Haque, et al.
Published: (2023)
by: Bappy, Faisal Haque, et al.
Published: (2023)
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters
by: Zhang, Kunming, et al.
Published: (2025)
by: Zhang, Kunming, et al.
Published: (2025)
Contention Resolution, With and Without a Global Clock
by: Cai, Zixi, et al.
Published: (2026)
by: Cai, Zixi, et al.
Published: (2026)
Similar Items
-
OnePiece: A Large-Scale Distributed Inference System with RDMA for Complex AI-Generated Content (AIGC) Workflows
by: Chen, June, et al.
Published: (2026) -
CoCoI: Distributed Coded Inference System for Straggler Mitigation
by: Liu, Xing, et al.
Published: (2025) -
Content-Oblivious Leader Election in 2-Edge-Connected Networks
by: Chalopin, Jérémie, et al.
Published: (2025) -
Big Data-Driven Fraud Detection Using Machine Learning and Real-Time Stream Processing
by: Liu, Chen, et al.
Published: (2025) -
PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving
by: Huang, Weizhe, et al.
Published: (2025)