Saved in:
| Main Authors: | O'Quinn, Austin, Snedeker, Conor, Zhang, Siyuan, Kline, Jenna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.03070 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments
by: Zhou, Ao, et al.
Published: (2025)
by: Zhou, Ao, et al.
Published: (2025)
Private Model Personalization Revisited
by: Snedeker, Conor, et al.
Published: (2025)
by: Snedeker, Conor, et al.
Published: (2025)
Adaptive Configuration Selection for Multi-Model Inference Pipelines in Edge Computing
by: Sheng, Jinhao, et al.
Published: (2025)
by: Sheng, Jinhao, et al.
Published: (2025)
A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026)
by: Zhang, Yida, et al.
Published: (2026)
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding
by: Han, Yunhe, et al.
Published: (2026)
by: Han, Yunhe, et al.
Published: (2026)
Toward Sustainability-Aware LLM Inference on Edge Clusters
by: Rajashekar, Kolichala, et al.
Published: (2025)
by: Rajashekar, Kolichala, et al.
Published: (2025)
Modular Foundation Model Inference at the Edge: Network-Aware Microservice Optimization
by: Zhu, Juan, et al.
Published: (2026)
by: Zhu, Juan, et al.
Published: (2026)
OCTOPINF: Workload-Aware Inference Serving for Edge Video Analytics
by: Nguyen, Thanh-Tung, et al.
Published: (2025)
by: Nguyen, Thanh-Tung, et al.
Published: (2025)
Power Aware Dynamic Reallocation For Inference
by: Jiang, Yiwei, et al.
Published: (2026)
by: Jiang, Yiwei, et al.
Published: (2026)
Preemption Aware Task Scheduling for Priority and Deadline Constrained DNN Inference Task Offloading in Homogeneous Mobile-Edge Networks
by: Cotter, Jamie, et al.
Published: (2025)
by: Cotter, Jamie, et al.
Published: (2025)
Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations
by: Mounesan, Motahare, et al.
Published: (2025)
by: Mounesan, Motahare, et al.
Published: (2025)
SneakPeek: Data-Aware Model Selection and Scheduling for Inference Serving on the Edge
by: Wolfrath, Joel, et al.
Published: (2025)
by: Wolfrath, Joel, et al.
Published: (2025)
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
by: Zhang, Mingjin, et al.
Published: (2024)
by: Zhang, Mingjin, et al.
Published: (2024)
Communication-Computation Pipeline Parallel Split Learning over Wireless Edge Networks
by: Liu, Chenyu, et al.
Published: (2025)
by: Liu, Chenyu, et al.
Published: (2025)
EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge
by: Cao, Jiahe, et al.
Published: (2026)
by: Cao, Jiahe, et al.
Published: (2026)
Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management
by: Wu, Z., et al.
Published: (2025)
by: Wu, Z., et al.
Published: (2025)
PARD: Enhancing Goodput for Inference Pipeline via Proactive Request Dropping
by: Zhao, Zhixin, et al.
Published: (2026)
by: Zhao, Zhixin, et al.
Published: (2026)
MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2025)
by: Yang, Zheming, et al.
Published: (2025)
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026)
by: Yang, Zheming, et al.
Published: (2026)
TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference
by: Zhang, Hongbin, et al.
Published: (2025)
by: Zhang, Hongbin, et al.
Published: (2025)
PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
by: Yang, Xiang, et al.
Published: (2022)
by: Yang, Xiang, et al.
Published: (2022)
Bandwidth-Aware and Cost-Efficient Pipeline Parallel Scheduling in Geo-Distributed LLM Training
by: Zhang, Han, et al.
Published: (2026)
by: Zhang, Han, et al.
Published: (2026)
Priority-Aware Model-Distributed Inference at Edge Networks
by: Li, Teng, et al.
Published: (2024)
by: Li, Teng, et al.
Published: (2024)
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
by: Ahmad, Sohaib, et al.
Published: (2024)
by: Ahmad, Sohaib, et al.
Published: (2024)
RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer Unfreezing
by: Li, Liang, et al.
Published: (2025)
by: Li, Liang, et al.
Published: (2025)
Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)
by: Arya, Mayank, et al.
Published: (2025)
Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)
by: Wu, Tian, et al.
Published: (2025)
Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization
by: Gao, Luyao, et al.
Published: (2024)
by: Gao, Luyao, et al.
Published: (2024)
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)
by: Sun, Mingyu, et al.
Published: (2025)
Evaluating Container Orchestration for Neuromorphic Workloads in Virtual Edge Environments
by: Pham, Huyen, et al.
Published: (2026)
by: Pham, Huyen, et al.
Published: (2026)
Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)
by: K., Prashanthi S., et al.
Published: (2025)
Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2023)
by: K., Prashanthi S., et al.
Published: (2023)
Decentralized LLM Inference over Edge Networks with Energy Harvesting
by: Khoshsirat, Aria, et al.
Published: (2024)
by: Khoshsirat, Aria, et al.
Published: (2024)
RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA Models
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge Devices
by: Han, Xueyuan, et al.
Published: (2024)
by: Han, Xueyuan, et al.
Published: (2024)
PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks
by: Zhan, Huiyou, et al.
Published: (2025)
by: Zhan, Huiyou, et al.
Published: (2025)
SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
by: He, Yongchao, et al.
Published: (2025)
by: He, Yongchao, et al.
Published: (2025)
Squeezing Edge Performance: A Sensitivity-Aware Container Management for Heterogeneous Tasks
by: Zhang, Yongmin, et al.
Published: (2025)
by: Zhang, Yongmin, et al.
Published: (2025)
DGNNFlow: A Streaming Dataflow Architecture for Real-Time Edge-based Dynamic GNN Inference in HL-LHC Trigger Systems
by: Maharaj, Davendra, et al.
Published: (2026)
by: Maharaj, Davendra, et al.
Published: (2026)
A Decentralized Root Cause Localization Approach for Edge Computing Environments
by: Fernando, Duneesha, et al.
Published: (2025)
by: Fernando, Duneesha, et al.
Published: (2025)
Similar Items
-
ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments
by: Zhou, Ao, et al.
Published: (2025) -
Private Model Personalization Revisited
by: Snedeker, Conor, et al.
Published: (2025) -
Adaptive Configuration Selection for Multi-Model Inference Pipelines in Edge Computing
by: Sheng, Jinhao, et al.
Published: (2025) -
A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026) -
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding
by: Han, Yunhe, et al.
Published: (2026)