Saved in:
| Main Authors: | Jiang, Linyi, Fu, Silvery D., Zhu, Yifei, Li, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.10047 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hyperion: Low-Latency Ultra-HD Video Analytics via Collaborative Vision Transformer Inference
by: Jiang, Linyi, et al.
Published: (2025)
by: Jiang, Linyi, et al.
Published: (2025)
Dynamic Scheduling Strategies for Resource Optimization in Computing Environments
by: Wang, Xiaoye
Published: (2024)
by: Wang, Xiaoye
Published: (2024)
Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
by: Wu, Ruilong, et al.
Published: (2025)
by: Wu, Ruilong, et al.
Published: (2025)
DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
by: Xue, Zhenliang, et al.
Published: (2025)
by: Xue, Zhenliang, et al.
Published: (2025)
Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
Striking the Right Balance between Compute and Copy: Improving LLM Inferencing Under Speculative Decoding
by: Ramachandran, Arun, et al.
Published: (2025)
by: Ramachandran, Arun, et al.
Published: (2025)
FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks
by: Xian, Youquan, et al.
Published: (2023)
by: Xian, Youquan, et al.
Published: (2023)
A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments
by: Zhang, Ruirui, et al.
Published: (2024)
by: Zhang, Ruirui, et al.
Published: (2024)
Dynamic Resource Allocation for Virtual Machine Migration Optimization using Machine Learning
by: Gong, Yulu, et al.
Published: (2024)
by: Gong, Yulu, et al.
Published: (2024)
Collaborative Split Federated Learning with Parallel Training and Aggregation
by: Papageorgiou, Yiannis, et al.
Published: (2025)
by: Papageorgiou, Yiannis, et al.
Published: (2025)
SFPrompt: Communication-Efficient Split Federated Fine-Tuning for Large Pre-Trained Models over Resource-Limited Devices
by: Cao, Linxiao, et al.
Published: (2024)
by: Cao, Linxiao, et al.
Published: (2024)
PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
KVComp: A High-Performance, LLM-Aware, Lossy Compression Framework for KV Cache
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants
by: Yu, Mingkun, et al.
Published: (2025)
by: Yu, Mingkun, et al.
Published: (2025)
AI-Driven Cloud Resource Optimization for Multi-Cluster Environments
by: Punniyamoorthy, Vinoth, et al.
Published: (2025)
by: Punniyamoorthy, Vinoth, et al.
Published: (2025)
ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation
by: Kamani, Mohammad Mahdi, et al.
Published: (2025)
by: Kamani, Mohammad Mahdi, et al.
Published: (2025)
Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework
by: Jia, Ziye, et al.
Published: (2024)
by: Jia, Ziye, et al.
Published: (2024)
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
by: Fang, Jiarui, et al.
Published: (2024)
by: Fang, Jiarui, et al.
Published: (2024)
Transforming Future Data Center Operations and Management via Physical AI
by: Cao, Zhiwei, et al.
Published: (2025)
by: Cao, Zhiwei, et al.
Published: (2025)
Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection
by: Pasandideh, Faezeh, et al.
Published: (2026)
by: Pasandideh, Faezeh, et al.
Published: (2026)
Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint
by: Li, Jun, et al.
Published: (2024)
by: Li, Jun, et al.
Published: (2024)
Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments
by: Jin, Yihong, et al.
Published: (2025)
by: Jin, Yihong, et al.
Published: (2025)
Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs
by: Wang, Peiran, et al.
Published: (2025)
by: Wang, Peiran, et al.
Published: (2025)
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
by: Wang, Zhe, et al.
Published: (2024)
by: Wang, Zhe, et al.
Published: (2024)
Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation
by: Aribe Jr., Sales, et al.
Published: (2026)
by: Aribe Jr., Sales, et al.
Published: (2026)
Mesh-Attention: A New Communication-Efficient Distributed Attention with Improved Data Locality
by: Chen, Sirui, et al.
Published: (2025)
by: Chen, Sirui, et al.
Published: (2025)
Seesaw: High-throughput LLM Inference via Model Re-sharding
by: Su, Qidong, et al.
Published: (2025)
by: Su, Qidong, et al.
Published: (2025)
InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training
by: Wang, Shiju, et al.
Published: (2025)
by: Wang, Shiju, et al.
Published: (2025)
Decentralized AI: Permissionless LLM Inference on POKT Network
by: Olshansky, Daniel, et al.
Published: (2024)
by: Olshansky, Daniel, et al.
Published: (2024)
Demystifying the Communication Characteristics for Distributed Transformer Models
by: Anthony, Quentin, et al.
Published: (2024)
by: Anthony, Quentin, et al.
Published: (2024)
KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems
by: Lin, Jieke, et al.
Published: (2025)
by: Lin, Jieke, et al.
Published: (2025)
Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments
by: Liu, Junming, et al.
Published: (2025)
by: Liu, Junming, et al.
Published: (2025)
High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments
by: Rodriguez, Julian, et al.
Published: (2025)
by: Rodriguez, Julian, et al.
Published: (2025)
ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks
by: Shi, Ziji, et al.
Published: (2024)
by: Shi, Ziji, et al.
Published: (2024)
LLM Inference Serving: Survey of Recent Advances and Opportunities
by: Li, Baolin, et al.
Published: (2024)
by: Li, Baolin, et al.
Published: (2024)
Transformer-Based Model for Cold Start Mitigation in FaaS Architecture
by: Mouen, Alexandre Savi Fayam Mbala, et al.
Published: (2025)
by: Mouen, Alexandre Savi Fayam Mbala, et al.
Published: (2025)
HadaCore: Tensor Core Accelerated Hadamard Transform Kernel
by: Agarwal, Krish, et al.
Published: (2024)
by: Agarwal, Krish, et al.
Published: (2024)
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
by: Yuan, Yichao, et al.
Published: (2025)
by: Yuan, Yichao, et al.
Published: (2025)
Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment
by: Qazi, Muhammad Azlan, et al.
Published: (2026)
by: Qazi, Muhammad Azlan, et al.
Published: (2026)
FedSAC: Dynamic Submodel Allocation for Collaborative Fairness in Federated Learning
by: Wang, Zihui, et al.
Published: (2024)
by: Wang, Zihui, et al.
Published: (2024)
Similar Items
-
Hyperion: Low-Latency Ultra-HD Video Analytics via Collaborative Vision Transformer Inference
by: Jiang, Linyi, et al.
Published: (2025) -
Dynamic Scheduling Strategies for Resource Optimization in Computing Environments
by: Wang, Xiaoye
Published: (2024) -
Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization
by: Wu, Ruilong, et al.
Published: (2025) -
DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline
by: Xue, Zhenliang, et al.
Published: (2025) -
Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
by: Li, Rui, et al.
Published: (2025)