:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jiang, Linyi, Fu, Silvery D., Zhu, Yifei, Li, Bo
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.10047
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hyperion: Low-Latency Ultra-HD Video Analytics via Collaborative Vision Transformer Inference
by: Jiang, Linyi, et al.
Published: (2025)

Dynamic Scheduling Strategies for Resource Optimization in Computing Environments
by: Wang, Xiaoye
Published: (2024)

Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
by: Wu, Ruilong, et al.
Published: (2025)

DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
by: Xue, Zhenliang, et al.
Published: (2025)

Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
by: Li, Rui, et al.
Published: (2025)

Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding
by: Ramachandran, Arun, et al.
Published: (2025)

FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks
by: Xian, Youquan, et al.
Published: (2023)

A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments
by: Zhang, Ruirui, et al.
Published: (2024)

Dynamic Resource Allocation for Virtual Machine Migration Optimization using Machine Learning
by: Gong, Yulu, et al.
Published: (2024)

Collaborative Split Federated Learning with Parallel Training and Aggregation
by: Papageorgiou, Yiannis, et al.
Published: (2025)

SFPrompt: Communication-Efficient Split Federated Fine-Tuning for Large Pre-Trained Models over Resource-Limited Devices
by: Cao, Linxiao, et al.
Published: (2024)

PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression
by: Jiang, Bo, et al.
Published: (2025)

KVComp: A High-Performance, LLM-Aware, Lossy Compression Framework for KV Cache
by: Jiang, Bo, et al.
Published: (2025)

PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants
by: Yu, Mingkun, et al.
Published: (2025)

AI-Driven Cloud Resource Optimization for Multi-Cluster Environments
by: Punniyamoorthy, Vinoth, et al.
Published: (2025)

ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation
by: Kamani, Mohammad Mahdi, et al.
Published: (2025)

Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework
by: Jia, Ziye, et al.
Published: (2024)

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
by: Fang, Jiarui, et al.
Published: (2024)

Transforming Future Data Center Operations and Management via Physical AI
by: Cao, Zhiwei, et al.
Published: (2025)

Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection
by: Pasandideh, Faezeh, et al.
Published: (2026)

Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint
by: Li, Jun, et al.
Published: (2024)

Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments
by: Jin, Yihong, et al.
Published: (2025)

Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs
by: Wang, Peiran, et al.
Published: (2025)

Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
by: Wang, Zhe, et al.
Published: (2024)

Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation
by: Aribe Jr., Sales, et al.
Published: (2026)

Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality
by: Chen, Sirui, et al.
Published: (2025)

Seesaw: High-throughput LLM Inference via Model Re-sharding
by: Su, Qidong, et al.
Published: (2025)

InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training
by: Wang, Shiju, et al.
Published: (2025)

Decentralized AI: Permissionless LLM Inference on POKT Network
by: Olshansky, Daniel, et al.
Published: (2024)

Demystifying the Communication Characteristics for Distributed Transformer Models
by: Anthony, Quentin, et al.
Published: (2024)

KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems
by: Lin, Jieke, et al.
Published: (2025)

Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments
by: Liu, Junming, et al.
Published: (2025)

High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments
by: Rodriguez, Julian, et al.
Published: (2025)

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks
by: Shi, Ziji, et al.
Published: (2024)

LLM Inference Serving: Survey of Recent Advances and Opportunities
by: Li, Baolin, et al.
Published: (2024)

Transformer-Based Model for Cold Start Mitigation in FaaS Architecture
by: Mouen, Alexandre Savi Fayam Mbala, et al.
Published: (2025)

HadaCore: Tensor Core Accelerated Hadamard Transform Kernel
by: Agarwal, Krish, et al.
Published: (2024)

MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
by: Yuan, Yichao, et al.
Published: (2025)

Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment
by: Qazi, Muhammad Azlan, et al.
Published: (2026)

FedSAC: Dynamic Submodel Allocation for Collaborative Fairness in Federated Learning
by: Wang, Zihui, et al.
Published: (2024)