:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Zerui, Liu, Yan, Huang, Jun
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence Networking and Internet Architecture
Online Access:	https://arxiv.org/abs/2411.03376
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving
by: Liu, Zedong, et al.
Published: (2026)

NebulaFL: Effective Asynchronous Federated Learning for JointCloud Computing
by: Gao, Fei, et al.
Published: (2024)

Generative AI on the Edge: Architecture and Performance Evaluation
by: Nezami, Zeinab, et al.
Published: (2024)

AI Greenferencing: Routing AI Inferencing to Green Modular Data Centers with Heron
by: Reddy, Tella Rajashekhar, et al.
Published: (2025)

Adaptive Parameter-Efficient Federated Fine-Tuning on Heterogeneous Devices
by: Liu, Jun, et al.
Published: (2024)

FogROS2-FT: Fault Tolerant Cloud Robotics
by: Chen, Kaiyuan, et al.
Published: (2024)

High-speed Networking for Giga-Scale AI Factories
by: Khashab, Sajy, et al.
Published: (2026)

Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
by: Morabito, Roberto, et al.
Published: (2025)

Trust-Aware Routing for Distributed Generative AI Inference at the Edge
by: Nguyen, Chanh, et al.
Published: (2026)

Towards Net-Zero Carbon Emissions in Network AI for 6G and Beyond
by: Zhang, Peng, et al.
Published: (2023)

ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training
by: Li, Minghao, et al.
Published: (2026)

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks
by: Wang, Zhanwei, et al.
Published: (2026)

Meili: Enabling SmartNIC as a Service in the Cloud
by: Su, Qiang, et al.
Published: (2023)

When Digital Twin Meets 6G: Concepts, Obstacles, and Research Prospects
by: Liu, Wenshuai, et al.
Published: (2024)

Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training
by: Chen, Zixuan, et al.
Published: (2024)

An Open-Source Experimentation Framework for the Edge Cloud Continuum
by: Koukis, Georgios, et al.
Published: (2024)

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI
by: Ohta, Shoki, et al.
Published: (2023)

Agentic Performance at the Edge: Insights from Benchmarking
by: Wang, Shiqiang, et al.
Published: (2026)

Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges
by: Chen, Handi, et al.
Published: (2024)

Towards Practical Operation of Deep Reinforcement Learning Agents in Real-World Network Management at Open RAN Edges
by: Li, Haiyuan, et al.
Published: (2024)

HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network
by: Zheng, Peirong, et al.
Published: (2026)

PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services
by: Yang, Zheming, et al.
Published: (2024)

Digital Twinning of a Pressurized Water Reactor Startup Operation and Partial Computational Offloading in In-network Computing-Assisted Multiaccess Edge Computing
by: Aliyu, Ibrahim, et al.
Published: (2024)

Context-Aware Orchestration of Energy-Efficient Gossip Learning Schemes
by: Dinani, Mina Aghaei, et al.
Published: (2024)

When IoT Meet LLMs: Applications and Challenges
by: Kok, Ibrahim, et al.
Published: (2024)

Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices
by: Tang, Weiheng, et al.
Published: (2024)

Teola: Towards End-to-End Optimization of LLM-based Applications
by: Tan, Xin, et al.
Published: (2024)

Collective Communication Profiling of Modern-day Machine Learning Workloads
by: Gupta, Jit, et al.
Published: (2025)

Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models
by: Sun, Tingyang, et al.
Published: (2025)

Optimizing Split Learning Latency in TinyML-Based IoT Systems
by: Jenhani, Zied, et al.
Published: (2025)

Cluster Topology-Driven Placement of Experts Reduces Network Traffic in MoE Inference
by: Sivtsov, Danil, et al.
Published: (2025)

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics
by: Ma, Bole, et al.
Published: (2026)

The Implications of Decentralization in Blockchained Federated Learning: Evaluating the Impact of Model Staleness and Inconsistencies
by: Wilhelmi, Francesc, et al.
Published: (2023)

Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks
by: Wu, Changbo, et al.
Published: (2025)

eACGM: Non-instrumented Performance Tracing and Anomaly Detection towards Machine Learning Systems
by: Xu, Ruilin, et al.
Published: (2025)

Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices
by: Ye, Shengyuan, et al.
Published: (2025)

Enabling Intelligent Vehicular Networks Through Distributed Learning in the Non-Terrestrial Networks 6G Vision
by: Naseh, David, et al.
Published: (2023)

XWind: A Cross-site Router for Large Language Model Inference Serving at Renewable Energy Farms
by: Reddy, Tella Rajashekhar, et al.
Published: (2026)

Intelligent Task Offloading: Advanced MEC Task Offloading and Resource Management in 5G Networks
by: Ebrahimi, Alireza, et al.
Published: (2025)

AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUs
by: Antunes, Pedro, et al.
Published: (2025)