Saved in:
| Main Authors: | Hu, Pengfei, Qian, Yuhang, Zheng, Tianyue, Li, Ang, Chen, Zhe, Gao, Yue, Cheng, Xiuzhen, Luo, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.09747 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PIR-DSN: A Decentralized Storage Network Supporting Private Information Retrieval
by: Zhang, Jiahao, et al.
Published: (2025)
by: Zhang, Jiahao, et al.
Published: (2025)
OrbitBFT: Enabling Scalable and Robust BFT Consensus in LEO Constellations
by: Sun, Tianyi, et al.
Published: (2026)
by: Sun, Tianyi, et al.
Published: (2026)
A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026)
by: Zhang, Yida, et al.
Published: (2026)
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026)
by: Yang, Zheming, et al.
Published: (2026)
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
by: Ma, Bin, et al.
Published: (2026)
by: Ma, Bin, et al.
Published: (2026)
Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services
by: Chen, Haoyu, et al.
Published: (2025)
by: Chen, Haoyu, et al.
Published: (2025)
Communication Efficient and Provable Federated Unlearning
by: Tao, Youming, et al.
Published: (2024)
by: Tao, Youming, et al.
Published: (2024)
Power Aware Dynamic Reallocation For Inference
by: Jiang, Yiwei, et al.
Published: (2026)
by: Jiang, Yiwei, et al.
Published: (2026)
Distributed Bilevel Optimization with Dual Pruning for Resource-limited Clients
by: Li, Mingyi, et al.
Published: (2025)
by: Li, Mingyi, et al.
Published: (2025)
SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale Out
by: Sandholm, Thomas, et al.
Published: (2025)
by: Sandholm, Thomas, et al.
Published: (2025)
Efficient Distributed MLLM Training with Cornstarch
by: Jang, Insu, et al.
Published: (2025)
by: Jang, Insu, et al.
Published: (2025)
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)
by: Karfakis, George, et al.
Published: (2025)
MoA-Off: Adaptive Heterogeneous Modality-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2025)
by: Yang, Zheming, et al.
Published: (2025)
Understanding the Performance and Power of LLM Inferencing on Edge Accelerators
by: Arya, Mayank, et al.
Published: (2025)
by: Arya, Mayank, et al.
Published: (2025)
FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]
by: Zhang, Runhua, et al.
Published: (2025)
by: Zhang, Runhua, et al.
Published: (2025)
Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference
by: Shyam, Vasu, et al.
Published: (2026)
by: Shyam, Vasu, et al.
Published: (2026)
Asynchronous BFT Consensus Made Wireless
by: Liu, Shuo, et al.
Published: (2025)
by: Liu, Shuo, et al.
Published: (2025)
SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2024)
by: Lin, Zheng, et al.
Published: (2024)
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding
by: Han, Yunhe, et al.
Published: (2026)
by: Han, Yunhe, et al.
Published: (2026)
Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
by: Yu, Minchen, et al.
Published: (2023)
by: Yu, Minchen, et al.
Published: (2023)
FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference
by: Ma, Jian, et al.
Published: (2025)
by: Ma, Jian, et al.
Published: (2025)
Metronome: Efficient Scheduling for Periodic Traffic Jobs with Network and Priority Awareness
by: Jiang, Hao, et al.
Published: (2025)
by: Jiang, Hao, et al.
Published: (2025)
MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines
by: Gao, Lei, et al.
Published: (2024)
by: Gao, Lei, et al.
Published: (2024)
SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
by: Zhou, Qihui, et al.
Published: (2025)
by: Zhou, Qihui, et al.
Published: (2025)
Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
by: Liu, Ziming, et al.
Published: (2025)
by: Liu, Ziming, et al.
Published: (2025)
ReaLB: Real-Time Load Balancing for Multimodal MoE Inference
by: Wang, Yingping, et al.
Published: (2026)
by: Wang, Yingping, et al.
Published: (2026)
RServe: Overlapping Encoding and Prefill for Efficient LMM Inference
by: Guo, Tianyu, et al.
Published: (2025)
by: Guo, Tianyu, et al.
Published: (2025)
Towards Resource-Efficient Serverless LLM Inference with SLINFER
by: Xu, Chuhao, et al.
Published: (2025)
by: Xu, Chuhao, et al.
Published: (2025)
Efficient LLM Inference with Activation Checkpointing and Hybrid Caching
by: Lee, Sanghyeon, et al.
Published: (2025)
by: Lee, Sanghyeon, et al.
Published: (2025)
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)
by: Wang, Zhibin, et al.
Published: (2025)
Staleness-Centric Optimizations for Parallel Diffusion MoE Inference
by: Luo, Jiajun, et al.
Published: (2024)
by: Luo, Jiajun, et al.
Published: (2024)
From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning
by: Wilkins, Grant, et al.
Published: (2026)
by: Wilkins, Grant, et al.
Published: (2026)
FedRFQ: Prototype-Based Federated Learning with Reduced Redundancy, Minimal Failure, and Enhanced Quality
by: Yan, Biwei, et al.
Published: (2024)
by: Yan, Biwei, et al.
Published: (2024)
UELLM: A Unified and Efficient Approach for LLM Inference Serving
by: He, Yiyuan, et al.
Published: (2024)
by: He, Yiyuan, et al.
Published: (2024)
RIPPLE++: An Incremental Framework for Efficient GNN Inference on Evolving Graphs
by: Naman, Pranjal, et al.
Published: (2026)
by: Naman, Pranjal, et al.
Published: (2026)
Parallax: Efficient LLM Inference Service over Decentralized Environment
by: Tong, Chris, et al.
Published: (2025)
by: Tong, Chris, et al.
Published: (2025)
Efficient Multi-round LLM Inference over Disaggregated Serving
by: He, Wenhao, et al.
Published: (2026)
by: He, Wenhao, et al.
Published: (2026)
HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)
by: Lin, Haoran, et al.
Published: (2025)
Collaborative Speculative Inference for Efficient LLM Inference Serving
by: Gao, Luyao, et al.
Published: (2025)
by: Gao, Luyao, et al.
Published: (2025)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
Similar Items
-
PIR-DSN: A Decentralized Storage Network Supporting Private Information Retrieval
by: Zhang, Jiahao, et al.
Published: (2025) -
OrbitBFT: Enabling Scalable and Robust BFT Consensus in LEO Constellations
by: Sun, Tianyi, et al.
Published: (2026) -
A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026) -
MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference
by: Yang, Zheming, et al.
Published: (2026) -
CoCoDiff: Optimizing Collective Communications for Distributed Diffusion Transformer Inference Under Ulysses Sequence Parallelism
by: Ma, Bin, et al.
Published: (2026)