Saved in:
| Main Authors: | Liu, Zhibang, Xu, Chaonong, Lv, Zhenjie, Liu, Zhizhuo, Zhao, Suyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.04489 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cooperative Inference with Interleaved Operator Partitioning for CNNs
by: Liu, Zhibang, et al.
Published: (2024)
by: Liu, Zhibang, et al.
Published: (2024)
Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization
by: Gao, Luyao, et al.
Published: (2024)
by: Gao, Luyao, et al.
Published: (2024)
Many Hands Make Light Work: Accelerating Edge Inference via Multi-Client Collaborative Caching
by: Liang, Wenyi, et al.
Published: (2024)
by: Liang, Wenyi, et al.
Published: (2024)
Learning the Optimal Path and DNN Partition for Collaborative Edge Inference
by: Huang, Yin, et al.
Published: (2024)
by: Huang, Yin, et al.
Published: (2024)
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)
by: Sun, Mingyu, et al.
Published: (2025)
Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
by: Wang, Zhigang, et al.
Published: (2024)
by: Wang, Zhigang, et al.
Published: (2024)
Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services
by: Chen, Haoyu, et al.
Published: (2025)
by: Chen, Haoyu, et al.
Published: (2025)
Collaborative Inference for Large Models with Task Offloading and Early Exiting
by: Xie, Zuan, et al.
Published: (2024)
by: Xie, Zuan, et al.
Published: (2024)
Collaborative Speculative Inference for Efficient LLM Inference Serving
by: Gao, Luyao, et al.
Published: (2025)
by: Gao, Luyao, et al.
Published: (2025)
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)
by: Li, Jiamin, et al.
Published: (2022)
Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads
by: Zhao, Alan, et al.
Published: (2026)
by: Zhao, Alan, et al.
Published: (2026)
Minions: Accelerating Large Language Model Inference with Aggregated Speculative Execution
by: Wang, Siqi, et al.
Published: (2024)
by: Wang, Siqi, et al.
Published: (2024)
AnchorTP: Resilient LLM Inference with State-Preserving Elastic Tensor Parallelism
by: Xu, Wendong, et al.
Published: (2025)
by: Xu, Wendong, et al.
Published: (2025)
PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026)
by: Pacheco, Daniel, et al.
Published: (2026)
Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores
by: Schieffer, Gabin, et al.
Published: (2024)
by: Schieffer, Gabin, et al.
Published: (2024)
Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference
by: Masud, Adiba, et al.
Published: (2026)
by: Masud, Adiba, et al.
Published: (2026)
Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
by: Li, Rui, et al.
Published: (2024)
by: Li, Rui, et al.
Published: (2024)
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
by: Duan, Jiaang, et al.
Published: (2024)
by: Duan, Jiaang, et al.
Published: (2024)
Federated Inference for Heterogeneous LLM Communication and Collaboration
by: Chen, Zihan, et al.
Published: (2026)
by: Chen, Zihan, et al.
Published: (2026)
LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
by: Luo, Shuqing, et al.
Published: (2025)
by: Luo, Shuqing, et al.
Published: (2025)
Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)
by: Arya, Mayank, et al.
Published: (2025)
MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2025)
by: Yang, Zheming, et al.
Published: (2025)
SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling
by: Lv, Cunchi, et al.
Published: (2025)
by: Lv, Cunchi, et al.
Published: (2025)
Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning
by: Liu, Hangda, et al.
Published: (2025)
by: Liu, Hangda, et al.
Published: (2025)
Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
by: Wu, Tian, et al.
Published: (2025)
by: Wu, Tian, et al.
Published: (2025)
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026)
by: Yang, Zheming, et al.
Published: (2026)
Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)
by: K., Prashanthi S., et al.
Published: (2025)
Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2023)
by: K., Prashanthi S., et al.
Published: (2023)
Accelerating OpenPangu Inference on NPU via Speculative Decoding
by: Dai, Yuntao, et al.
Published: (2026)
by: Dai, Yuntao, et al.
Published: (2026)
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
by: Zhang, Mingjin, et al.
Published: (2024)
by: Zhang, Mingjin, et al.
Published: (2024)
Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks
by: Zhang, Songge, et al.
Published: (2026)
by: Zhang, Songge, et al.
Published: (2026)
Collaborative Inference in DNN-based Satellite Systems with Dynamic Task Streams
by: Guan, Jinglong, et al.
Published: (2023)
by: Guan, Jinglong, et al.
Published: (2023)
Federated Learning Using Coupled Tensor Train Decomposition
by: Zhang, Xiangtao, et al.
Published: (2024)
by: Zhang, Xiangtao, et al.
Published: (2024)
WindGP: Efficient Graph Partitioning on Heterogenous Machines
by: Zeng, Li, et al.
Published: (2024)
by: Zeng, Li, et al.
Published: (2024)
Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device
by: Tayal, Mumuksh, et al.
Published: (2025)
by: Tayal, Mumuksh, et al.
Published: (2025)
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)
by: Wang, Zhibin, et al.
Published: (2025)
SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models
by: Chen, Fahao, et al.
Published: (2025)
by: Chen, Fahao, et al.
Published: (2025)
Similar Items
-
Cooperative Inference with Interleaved Operator Partitioning for CNNs
by: Liu, Zhibang, et al.
Published: (2024) -
Accelerating End-Cloud Collaborative Inference via Near Bubble-free Pipeline Optimization
by: Gao, Luyao, et al.
Published: (2024) -
Many Hands Make Light Work: Accelerating Edge Inference via Multi-Client Collaborative Caching
by: Liang, Wenyi, et al.
Published: (2024) -
Learning the Optimal Path and DNN Partition for Collaborative Edge Inference
by: Huang, Yin, et al.
Published: (2024) -
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)