:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Zhibang, Xu, Chaonong, Lv, Zhenjie, Liu, Zhizhuo, Zhao, Suyu
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2501.04489
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cooperative Inference with Interleaved Operator Partitioning for CNNs
by: Liu, Zhibang, et al.
Published: (2024)

Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization
by: Gao, Luyao, et al.
Published: (2024)

Many Hands Make Light Work: Accelerating Edge Inference via Multi-Client Collaborative Caching
by: Liang, Wenyi, et al.
Published: (2024)

Learning the Optimal Path and DNN Partition for Collaborative Edge Inference
by: Huang, Yin, et al.
Published: (2024)

LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)

Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
by: Wang, Zhigang, et al.
Published: (2024)

Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services
by: Chen, Haoyu, et al.
Published: (2025)

Collaborative Inference for Large Models with Task Offloading and Early Exiting
by: Xie, Zuan, et al.
Published: (2024)

Collaborative Speculative Inference for Efficient LLM Inference Serving
by: Gao, Luyao, et al.
Published: (2025)

Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)

Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)

Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads
by: Zhao, Alan, et al.
Published: (2026)

Minions: Accelerating Large Language Model Inference with Aggregated Speculative Execution
by: Wang, Siqi, et al.
Published: (2024)

AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism
by: Xu, Wendong, et al.
Published: (2025)

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026)

Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores
by: Schieffer, Gabin, et al.
Published: (2024)

Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference
by: Masud, Adiba, et al.
Published: (2026)

Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
by: Li, Rui, et al.
Published: (2024)

MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
by: Duan, Jiaang, et al.
Published: (2024)

Federated Inference for Heterogeneous LLM Communication and Collaboration
by: Chen, Zihan, et al.
Published: (2026)

LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)

AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)

Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
by: Luo, Shuqing, et al.
Published: (2025)

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)

MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2025)

SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling
by: Lv, Cunchi, et al.
Published: (2025)

Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning
by: Liu, Hangda, et al.
Published: (2025)

Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)

MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026)

Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)

Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2023)

Accelerating OpenPangu Inference on NPU via Speculative Decoding
by: Dai, Yuntao, et al.
Published: (2026)

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
by: Zhang, Mingjin, et al.
Published: (2024)

Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks
by: Zhang, Songge, et al.
Published: (2026)

Collaborative Inference in DNN-based Satellite Systems with Dynamic Task Streams
by: Guan, Jinglong, et al.
Published: (2023)

Federated Learning Using Coupled Tensor Train Decomposition
by: Zhang, Xiangtao, et al.
Published: (2024)

WindGP: Efficient Graph Partitioning on Heterogenous Machines
by: Zeng, Li, et al.
Published: (2024)

Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device
by: Tayal, Mumuksh, et al.
Published: (2025)

Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)

SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models
by: Chen, Fahao, et al.
Published: (2025)