:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	O'Quinn, Austin, Snedeker, Conor, Zhang, Siyuan, Kline, Jenna
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2503.03070
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments
by: Zhou, Ao, et al.
Published: (2025)

Private Model Personalization Revisited
by: Snedeker, Conor, et al.
Published: (2025)

Adaptive Configuration Selection for Multi-Model Inference Pipelines in Edge Computing
by: Sheng, Jinhao, et al.
Published: (2025)

A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026)

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding
by: Han, Yunhe, et al.
Published: (2026)

Toward Sustainability-Aware LLM Inference on Edge Clusters
by: Rajashekar, Kolichala, et al.
Published: (2025)

Modular Foundation Model Inference at the Edge: Network-Aware Microservice Optimization
by: Zhu, Juan, et al.
Published: (2026)

OCTOPINF: Workload-Aware Inference Serving for Edge Video Analytics
by: Nguyen, Thanh-Tung, et al.
Published: (2025)

Power Aware Dynamic Reallocation For Inference
by: Jiang, Yiwei, et al.
Published: (2026)

Preemption Aware Task Scheduling for Priority and Deadline Constrained DNN Inference Task Offloading in Homogeneous Mobile-Edge Networks
by: Cotter, Jamie, et al.
Published: (2025)

Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations
by: Mounesan, Motahare, et al.
Published: (2025)

SneakPeek: Data-Aware Model Selection and Scheduling for Inference Serving on the Edge
by: Wolfrath, Joel, et al.
Published: (2025)

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
by: Zhang, Mingjin, et al.
Published: (2024)

Communication-Computation Pipeline Parallel Split Learning over Wireless Edge Networks
by: Liu, Chenyu, et al.
Published: (2025)

EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge
by: Cao, Jiahe, et al.
Published: (2026)

Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management
by: Wu, Z., et al.
Published: (2025)

PARD: Enhancing Goodput for Inference Pipeline via Proactive Request Dropping
by: Zhao, Zhixin, et al.
Published: (2026)

MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2025)

MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026)

TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference
by: Zhang, Hongbin, et al.
Published: (2025)

PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
by: Yang, Xiang, et al.
Published: (2022)

Bandwidth-Aware and Cost-Efficient Pipeline Parallel Scheduling in Geo-Distributed LLM Training
by: Zhang, Han, et al.
Published: (2026)

Priority-Aware Model-Distributed Inference at Edge Networks
by: Li, Teng, et al.
Published: (2024)

Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
by: Ahmad, Sohaib, et al.
Published: (2024)

RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer Unfreezing
by: Li, Liang, et al.
Published: (2025)

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)

Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization
by: Gao, Luyao, et al.
Published: (2024)

LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)

Evaluating Container Orchestration for Neuromorphic Workloads in Virtual Edge Environments
by: Pham, Huyen, et al.
Published: (2026)

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)

Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2023)

Decentralized LLM Inference over Edge Networks with Energy Harvesting
by: Khoshsirat, Aria, et al.
Published: (2024)

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA Models
by: Zheng, Zihao, et al.
Published: (2026)

Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge Devices
by: Han, Xueyuan, et al.
Published: (2024)

PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks
by: Zhan, Huiyou, et al.
Published: (2025)

SiPipe: Bridging the CPU-GPU Utilization Gap for Efficient Pipeline-Parallel LLM Inference
by: He, Yongchao, et al.
Published: (2025)

Squeezing Edge Performance: A Sensitivity-Aware Container Management for Heterogeneous Tasks
by: Zhang, Yongmin, et al.
Published: (2025)

DGNNFlow: A Streaming Dataflow Architecture for Real-Time Edge-based Dynamic GNN Inference in HL-LHC Trigger Systems
by: Maharaj, Davendra, et al.
Published: (2026)

A Decentralized Root Cause Localization Approach for Edge Computing Environments
by: Fernando, Duneesha, et al.
Published: (2025)