:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Maina, Hernán, Wolovick, Nicolás, Benotti, Luciana
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Distributed, Parallel, and Cluster Computing Machine Learning
Online Access:	https://arxiv.org/abs/2506.08433
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards cost-effective and resource-aware aggregation at Edge for Federated Learning
by: Khan, Ahmad Faraz, et al.
Published: (2022)

Salted Inference: Enhancing Privacy while Maintaining Efficiency of Split Inference in Mobile Computing
by: Malekzadeh, Mohammad, et al.
Published: (2023)

Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems
by: Shah, Mahak, et al.
Published: (2025)

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
by: Zhang, Muru, et al.
Published: (2025)

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
by: Li, Wenxuan, et al.
Published: (2025)

Unlocking Full Efficiency of Token Filtering in Large Language Model Training
by: Chai, Di, et al.
Published: (2025)

semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage
by: Hong, Ke, et al.
Published: (2025)

CAFL-L: Constraint-Aware Federated Learning with Lagrangian Dual Optimization for On-Device Language Models
by: Zheng, Dongqi, et al.
Published: (2025)

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
by: Charles, Zachary, et al.
Published: (2025)

$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning
by: Chen, Weicong, et al.
Published: (2025)

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
by: Jin, Tian, et al.
Published: (2025)

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
by: Pan, Xuchen, et al.
Published: (2025)

Alchemist: Towards the Design of Efficient Online Continual Learning System
by: Huang, Yuyang, et al.
Published: (2025)

P/D-Device: Disaggregated Large Language Model between Cloud and Devices
by: Jin, Yibo, et al.
Published: (2025)

X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
by: Yuan, Yueming, et al.
Published: (2025)

Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering
by: Hong, Ke, et al.
Published: (2025)

Optimizing RLHF Training for Large Language Models with Stage Fusion
by: Zhong, Yinmin, et al.
Published: (2024)

Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
by: Huang, Mincong, et al.
Published: (2024)

Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization
by: Che, Tianshi, et al.
Published: (2023)

Towards Resiliency in Large Language Model Serving with KevlarFlow
by: Qian, Shangshu, et al.
Published: (2026)

Scalable Training of Mixture-of-Experts Models with Megatron Core
by: Yan, Zijie, et al.
Published: (2026)

JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning
by: Tahir, Anique, et al.
Published: (2024)

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
by: Miao, Xupeng, et al.
Published: (2023)

Fast Matrix Multiplications for Lookup Table-Quantized LLMs
by: Guo, Han, et al.
Published: (2024)

Towards Federated RLHF with Aggregated Client Preference for LLMs
by: Wu, Feijie, et al.
Published: (2024)

LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services
by: Łazuka, Małgorzata, et al.
Published: (2024)

InferCept: Efficient Intercept Support for Augmented Large Language Model Inference
by: Abhyankar, Reyna, et al.
Published: (2024)

SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2024)

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
by: Butler, Branden, et al.
Published: (2024)

Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
by: Xu, Yechen, et al.
Published: (2024)

Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow
by: Mei, Yixuan, et al.
Published: (2024)

Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models
by: Wang, Zezhou, et al.
Published: (2024)

Queue management for slo-oriented large language model serving
by: Patke, Archit, et al.
Published: (2024)

PAAC: Privacy-Aware Agentic Device-Cloud Collaboration
by: Yuan, Liangqi, et al.
Published: (2026)

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
by: Qiu, Haoran, et al.
Published: (2024)

Pipeline Parallelism with Controllable Memory
by: Qi, Penghui, et al.
Published: (2024)

P/D-Serve: Serving Disaggregated Large Language Model at Scale
by: Jin, Yibo, et al.
Published: (2024)

FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model
by: Wu, Feijie, et al.
Published: (2024)

FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees
by: Oliaro, Gabriele, et al.
Published: (2024)

Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
by: Cai, Weilin, et al.
Published: (2024)