Guardado en:
| Autores principales: | Ifath, Md. Monzurul Amin, Haque, Israat |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2604.09611 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Fast Prototyping of Distributed Stream Processing Applications with stream2gym
por: Ifath, Md. Monzurul Amin, et al.
Publicado: (2024)
por: Ifath, Md. Monzurul Amin, et al.
Publicado: (2024)
Scaling Performance of Large Language Model Pretraining
por: Interrante-Grant, Alexander, et al.
Publicado: (2025)
por: Interrante-Grant, Alexander, et al.
Publicado: (2025)
Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum
por: Dazzi, Patrizio, et al.
Publicado: (2026)
por: Dazzi, Patrizio, et al.
Publicado: (2026)
Efficient Multi-Model Orchestration for Self-Hosted Large Language Models
por: Vangala, Bhanu Prakash, et al.
Publicado: (2025)
por: Vangala, Bhanu Prakash, et al.
Publicado: (2025)
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
por: Wang, Shuang, et al.
Publicado: (2025)
por: Wang, Shuang, et al.
Publicado: (2025)
Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention
por: Liao, Mengqi, et al.
Publicado: (2026)
por: Liao, Mengqi, et al.
Publicado: (2026)
Can Large Language Models Predict Parallel Code Performance?
por: Bolet, Gregory, et al.
Publicado: (2025)
por: Bolet, Gregory, et al.
Publicado: (2025)
Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges
por: Islam, Md Romyull, et al.
Publicado: (2025)
por: Islam, Md Romyull, et al.
Publicado: (2025)
Araucaria: Simplifying INC Fault Tolerance with High-Level Intents
por: Parizotto, Ricardo, et al.
Publicado: (2024)
por: Parizotto, Ricardo, et al.
Publicado: (2024)
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge
por: Abstreiter, Maximilian, et al.
Publicado: (2025)
por: Abstreiter, Maximilian, et al.
Publicado: (2025)
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
por: Liang, Mingyu, et al.
Publicado: (2025)
por: Liang, Mingyu, et al.
Publicado: (2025)
Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications
por: Li, Jiaxi, et al.
Publicado: (2025)
por: Li, Jiaxi, et al.
Publicado: (2025)
Can Large Language Models Write Parallel Code?
por: Nichols, Daniel, et al.
Publicado: (2024)
por: Nichols, Daniel, et al.
Publicado: (2024)
Hierarchical Autoscaling for Large Language Model Serving with Chiron
por: Patke, Archit, et al.
Publicado: (2025)
por: Patke, Archit, et al.
Publicado: (2025)
Accelerating HDC-CNN Hybrid Models Using Custom Instructions on RISC-V GPUs
por: Matsumi, Wakuto, et al.
Publicado: (2025)
por: Matsumi, Wakuto, et al.
Publicado: (2025)
HPC-Coder: Modeling Parallel Programs using Large Language Models
por: Nichols, Daniel, et al.
Publicado: (2023)
por: Nichols, Daniel, et al.
Publicado: (2023)
LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows
por: Yang, Lingyun, et al.
Publicado: (2026)
por: Yang, Lingyun, et al.
Publicado: (2026)
Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines
por: Wagenländer, Marcel, et al.
Publicado: (2026)
por: Wagenländer, Marcel, et al.
Publicado: (2026)
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
por: Yang, Yuting, et al.
Publicado: (2024)
por: Yang, Yuting, et al.
Publicado: (2024)
Large Language Model Partitioning for Low-Latency Inference at the Edge
por: Kafetzis, Dimitrios, et al.
Publicado: (2025)
por: Kafetzis, Dimitrios, et al.
Publicado: (2025)
Equinox: Holistic Fair Scheduling in Serving Large Language Models
por: Wei, Zhixiang, et al.
Publicado: (2025)
por: Wei, Zhixiang, et al.
Publicado: (2025)
Performance Analysis of Decentralized Federated Learning Deployments
por: Jiang, Chengyan, et al.
Publicado: (2025)
por: Jiang, Chengyan, et al.
Publicado: (2025)
Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference
por: He, Zifan, et al.
Publicado: (2026)
por: He, Zifan, et al.
Publicado: (2026)
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning
por: Xu, Lang, et al.
Publicado: (2025)
por: Xu, Lang, et al.
Publicado: (2025)
Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
por: Li, Rui, et al.
Publicado: (2025)
por: Li, Rui, et al.
Publicado: (2025)
Accelerating Large Language Model Training with Hybrid GPU-based Compression
por: Xu, Lang, et al.
Publicado: (2024)
por: Xu, Lang, et al.
Publicado: (2024)
WORKSWORLD: A Domain for Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows
por: Paul, Taylor, et al.
Publicado: (2026)
por: Paul, Taylor, et al.
Publicado: (2026)
Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud
por: Mounesan, Motahare, et al.
Publicado: (2024)
por: Mounesan, Motahare, et al.
Publicado: (2024)
The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science
por: Shin, Woong, et al.
Publicado: (2025)
por: Shin, Woong, et al.
Publicado: (2025)
Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications
por: Merzky, Andre, et al.
Publicado: (2025)
por: Merzky, Andre, et al.
Publicado: (2025)
A Study on Messaging Trade-offs in Data Streaming for Scientific Workflows
por: George, Anjus, et al.
Publicado: (2025)
por: George, Anjus, et al.
Publicado: (2025)
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
por: Han, Shujie, et al.
Publicado: (2026)
por: Han, Shujie, et al.
Publicado: (2026)
TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference
por: Papaioannou, Konstantinos, et al.
Publicado: (2026)
por: Papaioannou, Konstantinos, et al.
Publicado: (2026)
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
por: Xu, Jiaming, et al.
Publicado: (2025)
por: Xu, Jiaming, et al.
Publicado: (2025)
A Survey on Large Language Model Acceleration based on KV Cache Management
por: Li, Haoyang, et al.
Publicado: (2024)
por: Li, Haoyang, et al.
Publicado: (2024)
AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure
por: The AIBrix Team, et al.
Publicado: (2025)
por: The AIBrix Team, et al.
Publicado: (2025)
Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments
por: Jin, Yihong, et al.
Publicado: (2025)
por: Jin, Yihong, et al.
Publicado: (2025)
KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider
por: Wang, Jiahao, et al.
Publicado: (2025)
por: Wang, Jiahao, et al.
Publicado: (2025)
Scalable AI-assisted Workflow Management for Detector Design Optimization Using Distributed Computing
por: Anderson, Derek, et al.
Publicado: (2026)
por: Anderson, Derek, et al.
Publicado: (2026)
Keep Your Friends Close: Leveraging Affinity Groups to Accelerate AI Inference Workflows
por: Garrett, Thiago, et al.
Publicado: (2023)
por: Garrett, Thiago, et al.
Publicado: (2023)
Ejemplares similares
-
Fast Prototyping of Distributed Stream Processing Applications with stream2gym
por: Ifath, Md. Monzurul Amin, et al.
Publicado: (2024) -
Scaling Performance of Large Language Model Pretraining
por: Interrante-Grant, Alexander, et al.
Publicado: (2025) -
Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum
por: Dazzi, Patrizio, et al.
Publicado: (2026) -
Efficient Multi-Model Orchestration for Self-Hosted Large Language Models
por: Vangala, Bhanu Prakash, et al.
Publicado: (2025) -
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
por: Wang, Shuang, et al.
Publicado: (2025)