Saved in:
| Main Authors: | Maina, Hernán, Wolovick, Nicolás, Benotti, Luciana |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.08433 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards cost-effective and resource-aware aggregation at Edge for Federated Learning
by: Khan, Ahmad Faraz, et al.
Published: (2022)
by: Khan, Ahmad Faraz, et al.
Published: (2022)
Salted Inference: Enhancing Privacy while Maintaining Efficiency of Split Inference in Mobile Computing
by: Malekzadeh, Mohammad, et al.
Published: (2023)
by: Malekzadeh, Mohammad, et al.
Published: (2023)
Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems
by: Shah, Mahak, et al.
Published: (2025)
by: Shah, Mahak, et al.
Published: (2025)
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
by: Zhang, Muru, et al.
Published: (2025)
by: Zhang, Muru, et al.
Published: (2025)
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
by: Li, Wenxuan, et al.
Published: (2025)
by: Li, Wenxuan, et al.
Published: (2025)
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
by: Chai, Di, et al.
Published: (2025)
by: Chai, Di, et al.
Published: (2025)
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage
by: Hong, Ke, et al.
Published: (2025)
by: Hong, Ke, et al.
Published: (2025)
CAFL-L: Constraint-Aware Federated Learning with Lagrangian Dual Optimization for On-Device Language Models
by: Zheng, Dongqi, et al.
Published: (2025)
by: Zheng, Dongqi, et al.
Published: (2025)
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
by: Charles, Zachary, et al.
Published: (2025)
by: Charles, Zachary, et al.
Published: (2025)
$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning
by: Chen, Weicong, et al.
Published: (2025)
by: Chen, Weicong, et al.
Published: (2025)
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
by: Jin, Tian, et al.
Published: (2025)
by: Jin, Tian, et al.
Published: (2025)
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
by: Pan, Xuchen, et al.
Published: (2025)
by: Pan, Xuchen, et al.
Published: (2025)
Alchemist: Towards the Design of Efficient Online Continual Learning System
by: Huang, Yuyang, et al.
Published: (2025)
by: Huang, Yuyang, et al.
Published: (2025)
P/D-Device: Disaggregated Large Language Model between Cloud and Devices
by: Jin, Yibo, et al.
Published: (2025)
by: Jin, Yibo, et al.
Published: (2025)
X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
by: Yuan, Yueming, et al.
Published: (2025)
by: Yuan, Yueming, et al.
Published: (2025)
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering
by: Hong, Ke, et al.
Published: (2025)
by: Hong, Ke, et al.
Published: (2025)
Optimizing RLHF Training for Large Language Models with Stage Fusion
by: Zhong, Yinmin, et al.
Published: (2024)
by: Zhong, Yinmin, et al.
Published: (2024)
Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe
by: Huang, Mincong, et al.
Published: (2024)
by: Huang, Mincong, et al.
Published: (2024)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization
by: Che, Tianshi, et al.
Published: (2023)
by: Che, Tianshi, et al.
Published: (2023)
Towards Resiliency in Large Language Model Serving with KevlarFlow
by: Qian, Shangshu, et al.
Published: (2026)
by: Qian, Shangshu, et al.
Published: (2026)
Scalable Training of Mixture-of-Experts Models with Megatron Core
by: Yan, Zijie, et al.
Published: (2026)
by: Yan, Zijie, et al.
Published: (2026)
JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning
by: Tahir, Anique, et al.
Published: (2024)
by: Tahir, Anique, et al.
Published: (2024)
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
by: Miao, Xupeng, et al.
Published: (2023)
by: Miao, Xupeng, et al.
Published: (2023)
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
by: Guo, Han, et al.
Published: (2024)
by: Guo, Han, et al.
Published: (2024)
Towards Federated RLHF with Aggregated Client Preference for LLMs
by: Wu, Feijie, et al.
Published: (2024)
by: Wu, Feijie, et al.
Published: (2024)
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services
by: Łazuka, Małgorzata, et al.
Published: (2024)
by: Łazuka, Małgorzata, et al.
Published: (2024)
InferCept: Efficient Intercept Support for Augmented Large Language Model Inference
by: Abhyankar, Reyna, et al.
Published: (2024)
by: Abhyankar, Reyna, et al.
Published: (2024)
SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
by: Lin, Zheng, et al.
Published: (2024)
by: Lin, Zheng, et al.
Published: (2024)
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
by: Butler, Branden, et al.
Published: (2024)
by: Butler, Branden, et al.
Published: (2024)
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
by: Xu, Yechen, et al.
Published: (2024)
by: Xu, Yechen, et al.
Published: (2024)
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow
by: Mei, Yixuan, et al.
Published: (2024)
by: Mei, Yixuan, et al.
Published: (2024)
Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models
by: Wang, Zezhou, et al.
Published: (2024)
by: Wang, Zezhou, et al.
Published: (2024)
Queue management for slo-oriented large language model serving
by: Patke, Archit, et al.
Published: (2024)
by: Patke, Archit, et al.
Published: (2024)
PAAC: Privacy-Aware Agentic Device-Cloud Collaboration
by: Yuan, Liangqi, et al.
Published: (2026)
by: Yuan, Liangqi, et al.
Published: (2026)
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
by: Qiu, Haoran, et al.
Published: (2024)
by: Qiu, Haoran, et al.
Published: (2024)
Pipeline Parallelism with Controllable Memory
by: Qi, Penghui, et al.
Published: (2024)
by: Qi, Penghui, et al.
Published: (2024)
P/D-Serve: Serving Disaggregated Large Language Model at Scale
by: Jin, Yibo, et al.
Published: (2024)
by: Jin, Yibo, et al.
Published: (2024)
FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model
by: Wu, Feijie, et al.
Published: (2024)
by: Wu, Feijie, et al.
Published: (2024)
FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees
by: Oliaro, Gabriele, et al.
Published: (2024)
by: Oliaro, Gabriele, et al.
Published: (2024)
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
by: Cai, Weilin, et al.
Published: (2024)
by: Cai, Weilin, et al.
Published: (2024)
Similar Items
-
Towards cost-effective and resource-aware aggregation at Edge for Federated Learning
by: Khan, Ahmad Faraz, et al.
Published: (2022) -
Salted Inference: Enhancing Privacy while Maintaining Efficiency of Split Inference in Mobile Computing
by: Malekzadeh, Mohammad, et al.
Published: (2023) -
Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems
by: Shah, Mahak, et al.
Published: (2025) -
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
by: Zhang, Muru, et al.
Published: (2025) -
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training
by: Li, Wenxuan, et al.
Published: (2025)