:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Ifath, Md. Monzurul Amin, Haque, Israat
Formato:	Preprint
Publicado:	2026
Materias:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2604.09611
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Fast Prototyping of Distributed Stream Processing Applications with stream2gym
por: Ifath, Md. Monzurul Amin, et al.
Publicado: (2024)

Scaling Performance of Large Language Model Pretraining
por: Interrante-Grant, Alexander, et al.
Publicado: (2025)

Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum
por: Dazzi, Patrizio, et al.
Publicado: (2026)

Efficient Multi-Model Orchestration for Self-Hosted Large Language Models
por: Vangala, Bhanu Prakash, et al.
Publicado: (2025)

Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
por: Wang, Shuang, et al.
Publicado: (2025)

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention
por: Liao, Mengqi, et al.
Publicado: (2026)

Can Large Language Models Predict Parallel Code Performance?
por: Bolet, Gregory, et al.
Publicado: (2025)

Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges
por: Islam, Md Romyull, et al.
Publicado: (2025)

Araucaria: Simplifying INC Fault Tolerance with High-Level Intents
por: Parizotto, Ricardo, et al.
Publicado: (2024)

Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge
por: Abstreiter, Maximilian, et al.
Publicado: (2025)

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
por: Liang, Mingyu, et al.
Publicado: (2025)

Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications
por: Li, Jiaxi, et al.
Publicado: (2025)

Can Large Language Models Write Parallel Code?
por: Nichols, Daniel, et al.
Publicado: (2024)

Hierarchical Autoscaling for Large Language Model Serving with Chiron
por: Patke, Archit, et al.
Publicado: (2025)

Accelerating HDC-CNN Hybrid Models Using Custom Instructions on RISC-V GPUs
por: Matsumi, Wakuto, et al.
Publicado: (2025)

HPC-Coder: Modeling Parallel Programs using Large Language Models
por: Nichols, Daniel, et al.
Publicado: (2023)

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows
por: Yang, Lingyun, et al.
Publicado: (2026)

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines
por: Wagenländer, Marcel, et al.
Publicado: (2026)

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
por: Yang, Yuting, et al.
Publicado: (2024)

Large Language Model Partitioning for Low-Latency Inference at the Edge
por: Kafetzis, Dimitrios, et al.
Publicado: (2025)

Equinox: Holistic Fair Scheduling in Serving Large Language Models
por: Wei, Zhixiang, et al.
Publicado: (2025)

Performance Analysis of Decentralized Federated Learning Deployments
por: Jiang, Chengyan, et al.
Publicado: (2025)

Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference
por: He, Zifan, et al.
Publicado: (2026)

Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
por: Xu, Lang, et al.
Publicado: (2025)

Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
por: Li, Rui, et al.
Publicado: (2025)

Accelerating Large Language Model Training with Hybrid GPU-based Compression
por: Xu, Lang, et al.
Publicado: (2024)

WORKSWORLD: A Domain for Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows
por: Paul, Taylor, et al.
Publicado: (2026)

Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud
por: Mounesan, Motahare, et al.
Publicado: (2024)

The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science
por: Shin, Woong, et al.
Publicado: (2025)

Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications
por: Merzky, Andre, et al.
Publicado: (2025)

A Study on Messaging Trade-offs in Data Streaming for Scientific Workflows
por: George, Anjus, et al.
Publicado: (2025)

TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
por: Han, Shujie, et al.
Publicado: (2026)

TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference
por: Papaioannou, Konstantinos, et al.
Publicado: (2026)

SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
por: Xu, Jiaming, et al.
Publicado: (2025)

A Survey on Large Language Model Acceleration based on KV Cache Management
por: Li, Haoyang, et al.
Publicado: (2024)

AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure
por: The AIBrix Team, et al.
Publicado: (2025)

Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments
por: Jin, Yihong, et al.
Publicado: (2025)

KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider
por: Wang, Jiahao, et al.
Publicado: (2025)

Scalable AI-assisted Workflow Management for Detector Design Optimization Using Distributed Computing
por: Anderson, Derek, et al.
Publicado: (2026)

Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
por: Garrett, Thiago, et al.
Publicado: (2023)