Saved in:
| Main Authors: | Baldin, Ilya, Goodrich, Michael, Gyurjyan, Vardan, Heyes, Graham, Howard, Derek, Kumar, Yatish, Lawrence, David, Sawatzky, Brad, Sheldon, Stacey, Timmer, Carl |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.12597 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Introducing JIRIAF: A Virtual Kubelet Integration for Optimizing HPC Resource Provisioning
by: Gyurjyan, Vardan, et al.
Published: (2025)
by: Gyurjyan, Vardan, et al.
Published: (2025)
A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR
by: Zhang, Yichao, et al.
Published: (2024)
by: Zhang, Yichao, et al.
Published: (2024)
DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics
by: Cao, Yingqi, et al.
Published: (2024)
by: Cao, Yingqi, et al.
Published: (2024)
Optimal Parallel Algorithms for Convex Hulls in 2D and 3D under Noisy Primitive Operations
by: Goodrich, Michael T., et al.
Published: (2025)
by: Goodrich, Michael T., et al.
Published: (2025)
Serverless Abstractions for Short-Running, Lightweight Streams
by: Carl, Natalie, et al.
Published: (2026)
by: Carl, Natalie, et al.
Published: (2026)
Bandwidth-Aware Network Topology Optimization for Decentralized Learning
by: Shen, Yipeng, et al.
Published: (2025)
by: Shen, Yipeng, et al.
Published: (2025)
The Carnot Bound: Limits and Possibilities for Bandwidth-Efficient Consensus
by: Lewis-Pye, Andrew, et al.
Published: (2026)
by: Lewis-Pye, Andrew, et al.
Published: (2026)
StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving
by: Kumar, Satyam, et al.
Published: (2026)
by: Kumar, Satyam, et al.
Published: (2026)
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI
by: Khalilov, Mikhail, et al.
Published: (2024)
by: Khalilov, Mikhail, et al.
Published: (2024)
Bandwidth-Aware LLM Inference on Heterogeneous Many-Core Supercomputers
by: Lu, Yao, et al.
Published: (2026)
by: Lu, Yao, et al.
Published: (2026)
Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams
by: Chen, David, et al.
Published: (2026)
by: Chen, David, et al.
Published: (2026)
On-Package Memory with Universal Chiplet Interconnect Express (UCIe): A Low Power, High Bandwidth, Low Latency and Low Cost Approach
by: Sharma, Debendra Das, et al.
Published: (2025)
by: Sharma, Debendra Das, et al.
Published: (2025)
Asynchronous Latency and Fast Atomic Snapshot
by: Bezerra, João Paulo, et al.
Published: (2024)
by: Bezerra, João Paulo, et al.
Published: (2024)
LA-IMR: Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling for Tail-Latency-Sensitive Cloud Robotics
by: Seo, Eunil, et al.
Published: (2025)
by: Seo, Eunil, et al.
Published: (2025)
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
by: Wu, Yongtong, et al.
Published: (2026)
by: Wu, Yongtong, et al.
Published: (2026)
DAG it off: Latency Prefers No Common Coins
by: Amores-Sesar, Ignacio, et al.
Published: (2025)
by: Amores-Sesar, Ignacio, et al.
Published: (2025)
Methodology for GPU Frequency Switching Latency Measurement
by: Velicka, Daniel, et al.
Published: (2025)
by: Velicka, Daniel, et al.
Published: (2025)
Computation-Bandwidth-Memory Trade-offs: A Unified Paradigm for AI Infrastructure
by: Fan, Yuankai, et al.
Published: (2025)
by: Fan, Yuankai, et al.
Published: (2025)
RL over Commodity Networks: Overcoming the Bandwidth Barrier with Lossless Sparse Deltas
by: Ruan, Chaoyi, et al.
Published: (2026)
by: Ruan, Chaoyi, et al.
Published: (2026)
WANify: Gauging and Balancing Runtime WAN Bandwidth for Geo-distributed Data Analytics
by: Mohapatra, Anshuman Das, et al.
Published: (2025)
by: Mohapatra, Anshuman Das, et al.
Published: (2025)
Areon: Latency-Friendly and Resilient Multi-Proposer Consensus
by: Castro-Castilla, Álvaro, et al.
Published: (2025)
by: Castro-Castilla, Álvaro, et al.
Published: (2025)
Dynamic Client Clustering, Bandwidth Allocation, and Workload Optimization for Semi-synchronous Federated Learning
by: Yu, Liangkun, et al.
Published: (2024)
by: Yu, Liangkun, et al.
Published: (2024)
Bandwidth-Aware and Cost-Efficient Pipeline Parallel Scheduling in Geo-Distributed LLM Training
by: Zhang, Han, et al.
Published: (2026)
by: Zhang, Han, et al.
Published: (2026)
Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication
by: Qian, Daniel, et al.
Published: (2026)
by: Qian, Daniel, et al.
Published: (2026)
Hiding Latencies in Network-Based Image Loading for Deep Learning
by: Versaci, Francesco, et al.
Published: (2025)
by: Versaci, Francesco, et al.
Published: (2025)
CD-Raft: Reducing the Latency of Distributed Consensus in Cross-Domain Sites
by: Wang, Yangyang, et al.
Published: (2026)
by: Wang, Yangyang, et al.
Published: (2026)
Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput
by: Dai, Xiaohai, et al.
Published: (2025)
by: Dai, Xiaohai, et al.
Published: (2025)
Memory Offloading for Large Language Model Inference with Latency SLO Guarantees
by: Ma, Chenxiang, et al.
Published: (2025)
by: Ma, Chenxiang, et al.
Published: (2025)
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)
by: Wang, Zhibin, et al.
Published: (2025)
A New Approach for Evaluating the Performance of Distributed Latency-Sensitive Services
by: Theodoropoulos, Theodoros, et al.
Published: (2024)
by: Theodoropoulos, Theodoros, et al.
Published: (2024)
Shared Memory-Aware Latency-Sensitive Message Aggregation for Fine-Grained Communication
by: Chandrasekar, Kavitha, et al.
Published: (2024)
by: Chandrasekar, Kavitha, et al.
Published: (2024)
BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG
by: Malkhi, Dahlia, et al.
Published: (2023)
by: Malkhi, Dahlia, et al.
Published: (2023)
Scene-Aware Latency Estimation for Microservices via Multi-Scale Graph Fusion
by: Sun, Zhichao, et al.
Published: (2026)
by: Sun, Zhichao, et al.
Published: (2026)
OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM Serving
by: Wu, Siyu, et al.
Published: (2025)
by: Wu, Siyu, et al.
Published: (2025)
Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
by: Hidayetoglu, Mert, et al.
Published: (2025)
by: Hidayetoglu, Mert, et al.
Published: (2025)
Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing
by: Liu, Mengjie, et al.
Published: (2024)
by: Liu, Mengjie, et al.
Published: (2024)
Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
by: Yu, Minchen, et al.
Published: (2023)
by: Yu, Minchen, et al.
Published: (2023)
Chasing the Speed of Light: Low-Latency Planetary-Scale Adaptive Byzantine Consensus
by: Berger, Christian, et al.
Published: (2023)
by: Berger, Christian, et al.
Published: (2023)
HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds
by: Lou, Chiheng, et al.
Published: (2025)
by: Lou, Chiheng, et al.
Published: (2025)
PCR: A Prefetch-Enhanced Cache Reuse System for Low-Latency RAG Serving
by: Wang, Wenfeng, et al.
Published: (2026)
by: Wang, Wenfeng, et al.
Published: (2026)
Similar Items
-
Introducing JIRIAF: A Virtual Kubelet Integration for Optimizing HPC Resource Provisioning
by: Gyurjyan, Vardan, et al.
Published: (2025) -
A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR
by: Zhang, Yichao, et al.
Published: (2024) -
DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics
by: Cao, Yingqi, et al.
Published: (2024) -
Optimal Parallel Algorithms for Convex Hulls in 2D and 3D under Noisy Primitive Operations
by: Goodrich, Michael T., et al.
Published: (2025) -
Serverless Abstractions for Short-Running, Lightweight Streams
by: Carl, Natalie, et al.
Published: (2026)