:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Pengfei, Qian, Yuhang, Zheng, Tianyue, Li, Ang, Chen, Zhe, Gao, Yue, Cheng, Xiuzhen, Luo, Jun
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning Robotics
Online Access:	https://arxiv.org/abs/2410.09747
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PIR-DSN: A Decentralized Storage Network Supporting Private Information Retrieval
by: Zhang, Jiahao, et al.
Published: (2025)

OrbitBFT: Enabling Scalable and Robust BFT Consensus in LEO Constellations
by: Sun, Tianyi, et al.
Published: (2026)

A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026)

MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026)

CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
by: Ma, Bin, et al.
Published: (2026)

Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services
by: Chen, Haoyu, et al.
Published: (2025)

Communication Efficient and Provable Federated Unlearning
by: Tao, Youming, et al.
Published: (2024)

Power Aware Dynamic Reallocation For Inference
by: Jiang, Yiwei, et al.
Published: (2026)

Distributed Bilevel Optimization with Dual Pruning for Resource-limited Clients
by: Li, Mingyi, et al.
Published: (2025)

SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale Out
by: Sandholm, Thomas, et al.
Published: (2025)

Efficient Distributed MLLM Training with Cornstarch
by: Jang, Insu, et al.
Published: (2025)

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)

MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2025)

Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)

FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]
by: Zhang, Runhua, et al.
Published: (2025)

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference
by: Shyam, Vasu, et al.
Published: (2026)

Asynchronous BFT Consensus Made Wireless
by: Liu, Shuo, et al.
Published: (2025)

SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2024)

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding
by: Han, Yunhe, et al.
Published: (2026)

Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
by: Yu, Minchen, et al.
Published: (2023)

FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference
by: Ma, Jian, et al.
Published: (2025)

Metronome: Efficient Scheduling for Periodic Traffic Jobs with Network and Priority Awareness
by: Jiang, Hao, et al.
Published: (2025)

MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines
by: Gao, Lei, et al.
Published: (2024)

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
by: Zhou, Qihui, et al.
Published: (2025)

Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
by: Liu, Ziming, et al.
Published: (2025)

ReaLB: Real-Time Load Balancing for Multimodal MoE Inference
by: Wang, Yingping, et al.
Published: (2026)

RServe: Overlapping Encoding and Prefill for Efficient LMM Inference
by: Guo, Tianyu, et al.
Published: (2025)

Towards Resource-Efficient Serverless LLM Inference with SLINFER
by: Xu, Chuhao, et al.
Published: (2025)

Efficient LLM Inference with Activation Checkpointing and Hybrid Caching
by: Lee, Sanghyeon, et al.
Published: (2025)

Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)

Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
by: Luo, Jiajun, et al.
Published: (2024)

From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning
by: Wilkins, Grant, et al.
Published: (2026)

FedRFQ: Prototype-Based Federated Learning with Reduced Redundancy, Minimal Failure, and Enhanced Quality
by: Yan, Biwei, et al.
Published: (2024)

UELLM: A Unified and Efficient Approach for LLM Inference Serving
by: He, Yiyuan, et al.
Published: (2024)

RIPPLE++: An Incremental Framework for Efficient GNN Inference on Evolving Graphs
by: Naman, Pranjal, et al.
Published: (2026)

Parallax: Efficient LLM Inference Service over Decentralized Environment
by: Tong, Chris, et al.
Published: (2025)

Efficient Multi-round LLM Inference over Disaggregated Serving
by: He, Wenhao, et al.
Published: (2026)

HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)

Collaborative Speculative Inference for Efficient LLM Inference Serving
by: Gao, Luyao, et al.
Published: (2025)

EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)