Saved in:
| Main Authors: | Luiz, Anderson de Lima, Kurlekar, Shubham Vijay, Georges, Munir |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.17814 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and NVIDIA Data Center GPUs
by: Sada, Mohammad Firas, et al.
Published: (2025)
by: Sada, Mohammad Firas, et al.
Published: (2025)
Efficiently Scheduling Parallel DAG Tasks on Identical Multiprocessors
by: Lendve, Shardul, et al.
Published: (2024)
by: Lendve, Shardul, et al.
Published: (2024)
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
by: Li, Zonghang, et al.
Published: (2025)
by: Li, Zonghang, et al.
Published: (2025)
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025)
by: Zhang, Lingzhe, et al.
Published: (2025)
ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing
by: Yoon, Edward J.
Published: (2026)
by: Yoon, Edward J.
Published: (2026)
On-Device Generative AI for GDPR-Compliant Visual Monitoring: Natural Language Alerts from Local Object Detection
by: Schappacher-Tilp, Gudrun, et al.
Published: (2026)
by: Schappacher-Tilp, Gudrun, et al.
Published: (2026)
DPDPU: Data Processing with DPUs
by: Hu, Jiasheng, et al.
Published: (2024)
by: Hu, Jiasheng, et al.
Published: (2024)
Flex-MIG: Enabling Distributed Execution on MIG
by: Kim, Myeongsu, et al.
Published: (2025)
by: Kim, Myeongsu, et al.
Published: (2025)
push0: Scalable and Fault-Tolerant Orchestration for Zero-Knowledge Proof Generation
by: Ahmadvand, Mohsen, et al.
Published: (2026)
by: Ahmadvand, Mohsen, et al.
Published: (2026)
CRDT-Based Game State Synchronization in Peer-to-Peer VR
by: Dantas, Abel, et al.
Published: (2025)
by: Dantas, Abel, et al.
Published: (2025)
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
by: Sarker, Yeahia, et al.
Published: (2026)
by: Sarker, Yeahia, et al.
Published: (2026)
ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving
by: Li, Xiangchen, et al.
Published: (2026)
by: Li, Xiangchen, et al.
Published: (2026)
WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
by: Li, Xiangchen, et al.
Published: (2026)
by: Li, Xiangchen, et al.
Published: (2026)
Reexamining Paradigms of End-to-End Data Movement
by: Fang, Chin, et al.
Published: (2025)
by: Fang, Chin, et al.
Published: (2025)
nvidia-pcm: A D-Bus-Driven Platform Configuration Manager for OpenBMC Environments
by: Singh, Harinder
Published: (2026)
by: Singh, Harinder
Published: (2026)
Serverless Cold Starts and Where to Find Them
by: Joosen, Artjom, et al.
Published: (2024)
by: Joosen, Artjom, et al.
Published: (2024)
ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates
by: Lan, Tingfeng, et al.
Published: (2025)
by: Lan, Tingfeng, et al.
Published: (2025)
Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation
by: de Araujo, Camila Machado, et al.
Published: (2025)
by: de Araujo, Camila Machado, et al.
Published: (2025)
Addressing tokens dynamic generation, propagation, storage and renewal to secure the GlideinWMS pilot based jobs and system
by: Coimbra, Bruno Moreira, et al.
Published: (2025)
by: Coimbra, Bruno Moreira, et al.
Published: (2025)
Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI
by: Kolluru, Saicharan
Published: (2025)
by: Kolluru, Saicharan
Published: (2025)
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
by: Agarwal, Amit, et al.
Published: (2025)
by: Agarwal, Amit, et al.
Published: (2025)
Melding the Serverless Control Plane with the Conventional Cluster Manager for Speed and Resource Efficiency
by: Kondrashov, Leonid, et al.
Published: (2025)
by: Kondrashov, Leonid, et al.
Published: (2025)
D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree
by: Lei, Xiang, et al.
Published: (2025)
by: Lei, Xiang, et al.
Published: (2025)
Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting
by: Hallak, Khaled, et al.
Published: (2025)
by: Hallak, Khaled, et al.
Published: (2025)
Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project
by: Penke, Carolin, et al.
Published: (2025)
by: Penke, Carolin, et al.
Published: (2025)
Accelerating In-transit Isosurface Generation With Topology Preserving Compression
by: Li, Yanliang, et al.
Published: (2024)
by: Li, Yanliang, et al.
Published: (2024)
Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems
by: Sharma, Aasish Kumar, et al.
Published: (2025)
by: Sharma, Aasish Kumar, et al.
Published: (2025)
POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
by: Kamath, Aditya K, et al.
Published: (2024)
by: Kamath, Aditya K, et al.
Published: (2024)
Deep RC: A Scalable Data Engineering and Deep Learning Pipeline
by: Sarker, Arup Kumar, et al.
Published: (2025)
by: Sarker, Arup Kumar, et al.
Published: (2025)
Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth
by: Nair, Arjun S.
Published: (2026)
by: Nair, Arjun S.
Published: (2026)
Thinking Longer, Not Always Smarter: Evaluating LLM Capabilities in Hierarchical Legal Reasoning
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks"
by: Dingfelder, Philipp, et al.
Published: (2025)
by: Dingfelder, Philipp, et al.
Published: (2025)
Combining Serverless and High-Performance Computing Paradigms to support ML Data-Intensive Applications
by: Staylor, Mills, et al.
Published: (2025)
by: Staylor, Mills, et al.
Published: (2025)
Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
by: Georgiou, Athos
Published: (2026)
by: Georgiou, Athos
Published: (2026)
Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended
by: Kamath, Aditya K, et al.
Published: (2026)
by: Kamath, Aditya K, et al.
Published: (2026)
Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
by: Haque, Md. Asraful, et al.
Published: (2026)
by: Haque, Md. Asraful, et al.
Published: (2026)
The High Cost of Keeping Warm: Characterizing Overhead in Serverless Autoscaling Policies
by: Kondrashov, Leonid, et al.
Published: (2025)
by: Kondrashov, Leonid, et al.
Published: (2025)
Shattering the Ephemeral Storage Cost Barrier for Data-Intensive Serverless Workflows
by: Ustiugov, Dmitrii, et al.
Published: (2023)
by: Ustiugov, Dmitrii, et al.
Published: (2023)
Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis
by: Amamou, Hazem, et al.
Published: (2026)
by: Amamou, Hazem, et al.
Published: (2026)
GridPilot: Real-Time Grid-Responsive Control for AI Supercomputers
by: Constantinescu, Denisa-Andreea, et al.
Published: (2026)
by: Constantinescu, Denisa-Andreea, et al.
Published: (2026)
Similar Items
-
Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and NVIDIA Data Center GPUs
by: Sada, Mohammad Firas, et al.
Published: (2025) -
Efficiently Scheduling Parallel DAG Tasks on Identical Multiprocessors
by: Lendve, Shardul, et al.
Published: (2024) -
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
by: Li, Zonghang, et al.
Published: (2025) -
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025) -
ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing
by: Yoon, Edward J.
Published: (2026)