Saved in:
| Main Authors: | AbouElhamayed, Ahmed F., Balle, Susanne, Singh, Deshanand, Abdelfattah, Mohamed S. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.12981 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads
by: Zhao, Alan, et al.
Published: (2026)
by: Zhao, Alan, et al.
Published: (2026)
Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2023)
by: K., Prashanthi S., et al.
Published: (2023)
Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
by: Li, Rui, et al.
Published: (2024)
by: Li, Rui, et al.
Published: (2024)
Experimental Analysis of Server-Side Caching for Web Performance
by: Umar, Mohammad, et al.
Published: (2026)
by: Umar, Mohammad, et al.
Published: (2026)
Experiences with Model Context Protocol Servers for Science and High Performance Computing
by: Pan, Haochen, et al.
Published: (2025)
by: Pan, Haochen, et al.
Published: (2025)
Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference
by: Masud, Adiba, et al.
Published: (2026)
by: Masud, Adiba, et al.
Published: (2026)
Practical Performance Guarantees for Pipelined DNN Inference
by: Archer, Aaron, et al.
Published: (2023)
by: Archer, Aaron, et al.
Published: (2023)
Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2025)
by: K., Prashanthi S., et al.
Published: (2025)
Modular Architecture for High-Performance and Low Overhead Data Transfers
by: Swargo, Rasman Mubtasim, et al.
Published: (2025)
by: Swargo, Rasman Mubtasim, et al.
Published: (2025)
PipeMax: Enhancing Offline LLM Inference on Commodity GPU Servers
by: Zhang, Hongbin, et al.
Published: (2026)
by: Zhang, Hongbin, et al.
Published: (2026)
SlimEdge: Performance and Device Aware Distributed DNN Deployment on Resource-Constrained Edge Hardware
by: Kumar, Mahadev Sunil, et al.
Published: (2025)
by: Kumar, Mahadev Sunil, et al.
Published: (2025)
Collaborative Inference in DNN-based Satellite Systems with Dynamic Task Streams
by: Guan, Jinglong, et al.
Published: (2023)
by: Guan, Jinglong, et al.
Published: (2023)
Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device
by: Tayal, Mumuksh, et al.
Published: (2025)
by: Tayal, Mumuksh, et al.
Published: (2025)
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
by: Chen, Jiabin, et al.
Published: (2024)
by: Chen, Jiabin, et al.
Published: (2024)
Adaptive Heuristics for Scheduling DNN Inferencing on Edge and Cloud for Personalized UAV Fleets
by: Raj, Suman, et al.
Published: (2024)
by: Raj, Suman, et al.
Published: (2024)
AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices
by: Lin, Zheng, et al.
Published: (2024)
by: Lin, Zheng, et al.
Published: (2024)
DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs
by: Babaei, Amir Fakhim, et al.
Published: (2025)
by: Babaei, Amir Fakhim, et al.
Published: (2025)
Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression
by: Sung, Mingyu, et al.
Published: (2025)
by: Sung, Mingyu, et al.
Published: (2025)
Training DNN Models over Heterogeneous Clusters with Optimal Performance
by: Nie, Chengyi, et al.
Published: (2024)
by: Nie, Chengyi, et al.
Published: (2024)
From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning
by: Wilkins, Grant, et al.
Published: (2026)
by: Wilkins, Grant, et al.
Published: (2026)
Collaborative Satellite Computing through Adaptive DNN Task Splitting and Offloading
by: Peng, Shifeng, et al.
Published: (2024)
by: Peng, Shifeng, et al.
Published: (2024)
Infer-EDGE: Dynamic DNN Inference Optimization in 'Just-in-time' Edge-AI Implementations
by: Mounesan, Motahare, et al.
Published: (2025)
by: Mounesan, Motahare, et al.
Published: (2025)
Analysis of Server Throughput For Managed Big Data Analytics Frameworks
by: Anagnostakis, Emmanouil, et al.
Published: (2025)
by: Anagnostakis, Emmanouil, et al.
Published: (2025)
ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
by: Lee, Munkyu, et al.
Published: (2024)
by: Lee, Munkyu, et al.
Published: (2024)
Adaptive Device-Edge Collaboration on DNN Inference in AIoT: A Digital Twin-Assisted Approach
by: Hu, Shisheng, et al.
Published: (2024)
by: Hu, Shisheng, et al.
Published: (2024)
Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers
by: Paramanayakam, Varatheepan, et al.
Published: (2025)
by: Paramanayakam, Varatheepan, et al.
Published: (2025)
A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge
by: Mahmud, Hasanul, et al.
Published: (2024)
by: Mahmud, Hasanul, et al.
Published: (2024)
Are Bus-Mounted Edge Servers Feasible?
by: Li, Xuezhi, et al.
Published: (2025)
by: Li, Xuezhi, et al.
Published: (2025)
Preemption Aware Task Scheduling for Priority and Deadline Constrained DNN Inference Task Offloading in Homogeneous Mobile-Edge Networks
by: Cotter, Jamie, et al.
Published: (2025)
by: Cotter, Jamie, et al.
Published: (2025)
A Survey of End-to-End Modeling for Distributed DNN Training: Workloads, Simulators, and TCO
by: Svedas, Jonas, et al.
Published: (2025)
by: Svedas, Jonas, et al.
Published: (2025)
A Survey on Collaborative DNN Inference for Edge Intelligence
by: Ren, Weiqing, et al.
Published: (2022)
by: Ren, Weiqing, et al.
Published: (2022)
Practical Federated Learning without a Server
by: Dhasade, Akash, et al.
Published: (2025)
by: Dhasade, Akash, et al.
Published: (2025)
AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems
by: Wang, Lehao, et al.
Published: (2024)
by: Wang, Lehao, et al.
Published: (2024)
Enabling Large Batch Size Training for DNN Models Beyond the Memory Limit While Maintaining Performance
by: Piao, XinYu, et al.
Published: (2021)
by: Piao, XinYu, et al.
Published: (2021)
Modern Computing: Vision and Challenges
by: Gill, Sukhpal Singh, et al.
Published: (2024)
by: Gill, Sukhpal Singh, et al.
Published: (2024)
Checkmate: Zero-Overhead Model Checkpointing via Network Gradient Replication
by: Bhardwaj, Ankit, et al.
Published: (2025)
by: Bhardwaj, Ankit, et al.
Published: (2025)
AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training
by: Chen, Qiaoling, et al.
Published: (2023)
by: Chen, Qiaoling, et al.
Published: (2023)
KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI
by: Uhl, Tim Niklas, et al.
Published: (2024)
by: Uhl, Tim Niklas, et al.
Published: (2024)
SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators
by: Ji, Shixin, et al.
Published: (2024)
by: Ji, Shixin, et al.
Published: (2024)
Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs
by: Chen, Aodong, et al.
Published: (2023)
by: Chen, Aodong, et al.
Published: (2023)
Similar Items
-
Scaling LLM Inference Beyond Amdahl`s Limits via Eliminating Non-Scalable Overheads
by: Zhao, Alan, et al.
Published: (2026) -
Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
by: K., Prashanthi S., et al.
Published: (2023) -
Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing
by: Li, Rui, et al.
Published: (2024) -
Experimental Analysis of Server-Side Caching for Web Performance
by: Umar, Mohammad, et al.
Published: (2026) -
Experiences with Model Context Protocol Servers for Science and High Performance Computing
by: Pan, Haochen, et al.
Published: (2025)