:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luiz, Anderson de Lima, Kurlekar, Shubham Vijay, Georges, Munir
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence 68M20, 68T50 C.4; D.4.7; I.2.7
Online Access:	https://arxiv.org/abs/2508.17814
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and NVIDIA Data Center GPUs
by: Sada, Mohammad Firas, et al.
Published: (2025)

Efficiently Scheduling Parallel DAG Tasks on Identical Multiprocessors
by: Lendve, Shardul, et al.
Published: (2024)

Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
by: Li, Zonghang, et al.
Published: (2025)

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models
by: Zhang, Lingzhe, et al.
Published: (2025)

ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing
by: Yoon, Edward J.
Published: (2026)

On-Device Generative AI for GDPR-Compliant Visual Monitoring: Natural Language Alerts from Local Object Detection
by: Schappacher-Tilp, Gudrun, et al.
Published: (2026)

DPDPU: Data Processing with DPUs
by: Hu, Jiasheng, et al.
Published: (2024)

Flex-MIG: Enabling Distributed Execution on MIG
by: Kim, Myeongsu, et al.
Published: (2025)

push0: Scalable and Fault-Tolerant Orchestration for Zero-Knowledge Proof Generation
by: Ahmadvand, Mohsen, et al.
Published: (2026)

CRDT-Based Game State Synchronization in Peer-to-Peer VR
by: Dantas, Abel, et al.
Published: (2025)

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration
by: Sarker, Yeahia, et al.
Published: (2026)

ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving
by: Li, Xiangchen, et al.
Published: (2026)

WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
by: Li, Xiangchen, et al.
Published: (2026)

Reexamining Paradigms of End-to-End Data Movement
by: Fang, Chin, et al.
Published: (2025)

nvidia-pcm: A D-Bus-Driven Platform Configuration Manager for OpenBMC Environments
by: Singh, Harinder
Published: (2026)

Serverless Cold Starts and Where to Find Them
by: Joosen, Artjom, et al.
Published: (2024)

ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates
by: Lan, Tingfeng, et al.
Published: (2025)

Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation
by: de Araujo, Camila Machado, et al.
Published: (2025)

Addressing tokens dynamic generation, propagation, storage and renewal to secure the GlideinWMS pilot based jobs and system
by: Coimbra, Bruno Moreira, et al.
Published: (2025)

Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI
by: Kolluru, Saicharan
Published: (2025)

RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
by: Agarwal, Amit, et al.
Published: (2025)

Melding the Serverless Control Plane with the Conventional Cluster Manager for Speed and Resource Efficiency
by: Kondrashov, Leonid, et al.
Published: (2025)

D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree
by: Lei, Xiang, et al.
Published: (2025)

Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting
by: Hallak, Khaled, et al.
Published: (2025)

Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project
by: Penke, Carolin, et al.
Published: (2025)

Accelerating In-transit Isosurface Generation With Topology Preserving Compression
by: Li, Yanliang, et al.
Published: (2024)

Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems
by: Sharma, Aasish Kumar, et al.
Published: (2025)

POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
by: Kamath, Aditya K, et al.
Published: (2024)

Deep RC: A Scalable Data Engineering and Deep Learning Pipeline
by: Sarker, Arup Kumar, et al.
Published: (2025)

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth
by: Nair, Arjun S.
Published: (2026)

Thinking Longer, Not Always Smarter: Evaluating LLM Capabilities in Hierarchical Legal Reasoning
by: Zhang, Li, et al.
Published: (2025)

Sure! Here's a short and concise title for your paper: "Contamination in Generated Text Detection Benchmarks"
by: Dingfelder, Philipp, et al.
Published: (2025)

Combining Serverless and High-Performance Computing Paradigms to support ML Data-Intensive Applications
by: Staylor, Mills, et al.
Published: (2025)

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
by: Georgiou, Athos
Published: (2026)

Reducing the GPU Memory Bottleneck with Lossless Compression for ML -- Extended
by: Kamath, Aditya K, et al.
Published: (2026)

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval
by: Haque, Md. Asraful, et al.
Published: (2026)

The High Cost of Keeping Warm: Characterizing Overhead in Serverless Autoscaling Policies
by: Kondrashov, Leonid, et al.
Published: (2025)

Shattering the Ephemeral Storage Cost Barrier for Data-Intensive Serverless Workflows
by: Ustiugov, Dmitrii, et al.
Published: (2023)

Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis
by: Amamou, Hazem, et al.
Published: (2026)

GridPilot: Real-Time Grid-Responsive Control for AI Supercomputers
by: Constantinescu, Denisa-Andreea, et al.
Published: (2026)