Saved in:
| Main Authors: | Qi, Ji, Zhu, WenPeng, Li, Li, Wu, Ming, Wu, YingJun, He, Wu, Gao, Xun, Zeng, Jason, Heinrich, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.21263 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DiLoCo: Distributed Low-Communication Training of Language Models
by: Douillard, Arthur, et al.
Published: (2023)
by: Douillard, Arthur, et al.
Published: (2023)
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
by: Charles, Zachary, et al.
Published: (2025)
by: Charles, Zachary, et al.
Published: (2025)
N/S Co‐Doped Graphene Aerogels as Superior Anode Materials for High‐Rate Lithium‐Ion Batteries
by: Kaijie Gu, et al.
Published: (2024)
by: Kaijie Gu, et al.
Published: (2024)
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
by: Jaghouar, Sami, et al.
Published: (2024)
by: Jaghouar, Sami, et al.
Published: (2024)
Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
by: Defazio, Aaron, et al.
Published: (2025)
by: Defazio, Aaron, et al.
Published: (2025)
CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation
by: Chen, Yutong, et al.
Published: (2026)
by: Chen, Yutong, et al.
Published: (2026)
Eager Updates For Overlapped Communication and Computation in DiLoCo
by: Kale, Satyen, et al.
Published: (2025)
by: Kale, Satyen, et al.
Published: (2025)
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
by: Liang, Yan-Shuo, et al.
Published: (2024)
by: Liang, Yan-Shuo, et al.
Published: (2024)
LCQ: Low-Rank Codebook based Quantization for Large Language Models
by: Cai, Wen-Pu, et al.
Published: (2024)
by: Cai, Wen-Pu, et al.
Published: (2024)
Mixture of LoRA Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
by: Xie, Xingyu, et al.
Published: (2024)
by: Xie, Xingyu, et al.
Published: (2024)
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
by: Wu, Yize, et al.
Published: (2026)
by: Wu, Yize, et al.
Published: (2026)
DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster
by: Zhang, Jinxiao, et al.
Published: (2026)
by: Zhang, Jinxiao, et al.
Published: (2026)
Robust Multi-agent Communication Based on Decentralization-Oriented Adversarial Training
by: Ma, Xuyan, et al.
Published: (2025)
by: Ma, Xuyan, et al.
Published: (2025)
MuLoCo: Muon is a practical inner optimizer for DiLoCo
by: Thérien, Benjamin, et al.
Published: (2025)
by: Thérien, Benjamin, et al.
Published: (2025)
The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity
by: Wu, Tongle, et al.
Published: (2024)
by: Wu, Tongle, et al.
Published: (2024)
Ortho-Hydra: Orthogonalized Experts for DiT LoRA
by: Ji, Seunghyun
Published: (2026)
by: Ji, Seunghyun
Published: (2026)
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
by: Kolehmainen, Jari, et al.
Published: (2025)
by: Kolehmainen, Jari, et al.
Published: (2025)
Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning
by: Wang, Liwei, et al.
Published: (2024)
by: Wang, Liwei, et al.
Published: (2024)
Development and Exploratory Validation of the Assisting Mealtime Scale for Dementia Care: Nursing Staff Perspectives on Mealtime Support
by: Hansen (Cindy) Tang, et al.
Published: (2026)
by: Hansen (Cindy) Tang, et al.
Published: (2026)
Co-Design of Sensing, Communications, and Control for Low-Altitude Wireless Networks
by: Jin, Haijia, et al.
Published: (2025)
by: Jin, Haijia, et al.
Published: (2025)
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE
by: Zhu, Xun, et al.
Published: (2024)
by: Zhu, Xun, et al.
Published: (2024)
LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation
by: Zhang, Zhisheng, et al.
Published: (2026)
by: Zhang, Zhisheng, et al.
Published: (2026)
Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
by: Dong, Haotian, et al.
Published: (2025)
by: Dong, Haotian, et al.
Published: (2025)
What happens when nanochat meets DiLoCo?
by: Acker, Alexander, et al.
Published: (2025)
by: Acker, Alexander, et al.
Published: (2025)
Decoupled DiLoCo for Resilient Distributed Pre-training
by: Douillard, Arthur, et al.
Published: (2026)
by: Douillard, Arthur, et al.
Published: (2026)
ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning
by: Wang, Xiaoyu, et al.
Published: (2025)
by: Wang, Xiaoyu, et al.
Published: (2025)
Productions of $X(3872)$, $Z_c(3900)$, $X_2(4013)$, and $Z_c(4020)$ in $B_{(s)}$ decays offer strong clues on their molecular nature
by: Wu, Qi, et al.
Published: (2023)
by: Wu, Qi, et al.
Published: (2023)
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
by: Mu, Lin, et al.
Published: (2026)
by: Mu, Lin, et al.
Published: (2026)
LoL: Longer than Longer, Scaling Video Generation to Hour
by: Cui, Justin, et al.
Published: (2026)
by: Cui, Justin, et al.
Published: (2026)
InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation
by: Yang, Yifan, et al.
Published: (2026)
by: Yang, Yifan, et al.
Published: (2026)
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
by: Yang, Yang, et al.
Published: (2024)
by: Yang, Yang, et al.
Published: (2024)
Asymmetric Co-Training for Source-Free Few-Shot Domain Adaptation
by: Li, Gengxu, et al.
Published: (2025)
by: Li, Gengxu, et al.
Published: (2025)
Near-Field Beam Training: Joint Angle and Range Estimation with DFT Codebook
by: Wu, Xun, et al.
Published: (2023)
by: Wu, Xun, et al.
Published: (2023)
Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
by: Wu, Zihan, et al.
Published: (2024)
by: Wu, Zihan, et al.
Published: (2024)
Inference-time Alignment via Sparse Junction Steering
by: Hu, Runyi, et al.
Published: (2026)
by: Hu, Runyi, et al.
Published: (2026)
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
by: Ramasinghe, Sameera, et al.
Published: (2025)
by: Ramasinghe, Sameera, et al.
Published: (2025)
ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training
by: Li, Minghao, et al.
Published: (2026)
by: Li, Minghao, et al.
Published: (2026)
Co-occurrence is not Factual Association in Language Models
by: Zhang, Xiao, et al.
Published: (2024)
by: Zhang, Xiao, et al.
Published: (2024)
Pre‐ and postpollination barriers between a widespread and a narrow endemic species with one‐by‐one stamen movement
by: Wen‐Qian Xiang, et al.
Published: (2025)
by: Wen‐Qian Xiang, et al.
Published: (2025)
Similar Items
-
DiLoCo: Distributed Low-Communication Training of Language Models
by: Douillard, Arthur, et al.
Published: (2023) -
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
by: Charles, Zachary, et al.
Published: (2025) -
N/S Co‐Doped Graphene Aerogels as Superior Anode Materials for High‐Rate Lithium‐Ion Batteries
by: Kaijie Gu, et al.
Published: (2024) -
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
by: Jaghouar, Sami, et al.
Published: (2024) -
Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
by: Defazio, Aaron, et al.
Published: (2025)