:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Qi, Ji, Zhu, WenPeng, Li, Li, Wu, Ming, Wu, YingJun, He, Wu, Gao, Xun, Zeng, Jason, Heinrich, Michael
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2506.21263
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DiLoCo: Distributed Low-Communication Training of Language Models
by: Douillard, Arthur, et al.
Published: (2023)

Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
by: Charles, Zachary, et al.
Published: (2025)

N/S Co‐Doped Graphene Aerogels as Superior Anode Materials for High‐Rate Lithium‐Ion Batteries
by: Kaijie Gu, et al.
Published: (2024)

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
by: Jaghouar, Sami, et al.
Published: (2024)

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs
by: Defazio, Aaron, et al.
Published: (2025)

CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation
by: Chen, Yutong, et al.
Published: (2026)

Eager Updates For Overlapped Communication and Computation in DiLoCo
by: Kale, Satyen, et al.
Published: (2025)

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
by: Liang, Yan-Shuo, et al.
Published: (2024)

LCQ: Low-Rank Codebook based Quantization for Large Language Models
by: Cai, Wen-Pu, et al.
Published: (2024)

Mixture of LoRA Experts
by: Wu, Xun, et al.
Published: (2024)

LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
by: Xie, Xingyu, et al.
Published: (2024)

Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
by: Wu, Yize, et al.
Published: (2026)

DiT-HC: Enabling Efficient Training of Visual Generation Model DiT on HPC-oriented CPU Cluster
by: Zhang, Jinxiao, et al.
Published: (2026)

Robust Multi-agent Communication Based on Decentralization-Oriented Adversarial Training
by: Ma, Xuyan, et al.
Published: (2025)

MuLoCo: Muon is a practical inner optimizer for DiLoCo
by: Thérien, Benjamin, et al.
Published: (2025)

The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity
by: Wu, Tongle, et al.
Published: (2024)

Ortho-Hydra: Orthogonalized Experts for DiT LoRA
by: Ji, Seunghyun
Published: (2026)

NoLoCo: No-all-reduce Low Communication Training Method for Large Models
by: Kolehmainen, Jari, et al.
Published: (2025)

Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning
by: Wang, Liwei, et al.
Published: (2024)

Development and Exploratory Validation of the Assisting Mealtime Scale for Dementia Care: Nursing Staff Perspectives on Mealtime Support
by: Hansen (Cindy) Tang, et al.
Published: (2026)

Co-Design of Sensing, Communications, and Control for Low-Altitude Wireless Networks
by: Jin, Haijia, et al.
Published: (2025)

Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE
by: Zhu, Xun, et al.
Published: (2024)

LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation
by: Zhang, Zhisheng, et al.
Published: (2026)

Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
by: Dong, Haotian, et al.
Published: (2025)

What happens when nanochat meets DiLoCo?
by: Acker, Alexander, et al.
Published: (2025)

Decoupled DiLoCo for Resilient Distributed Pre-training
by: Douillard, Arthur, et al.
Published: (2026)

ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning
by: Wang, Xiaoyu, et al.
Published: (2025)

Productions of $X(3872)$, $Z_c(3900)$, $X_2(4013)$, and $Z_c(4020)$ in $B_{(s)}$ decays offer strong clues on their molecular nature
by: Wu, Qi, et al.
Published: (2023)

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
by: Mu, Lin, et al.
Published: (2026)

LoL: Longer than Longer, Scaling Video Generation to Hour
by: Cui, Justin, et al.
Published: (2026)

InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation
by: Yang, Yifan, et al.
Published: (2026)

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
by: Yang, Yang, et al.
Published: (2024)

Asymmetric Co-Training for Source-Free Few-Shot Domain Adaptation
by: Li, Gengxu, et al.
Published: (2025)

Near-Field Beam Training: Joint Angle and Range Estimation with DFT Codebook
by: Wu, Xun, et al.
Published: (2023)

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
by: Wu, Zihan, et al.
Published: (2024)

Inference-time Alignment via Sparse Junction Steering
by: Hu, Runyi, et al.
Published: (2026)

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
by: Ramasinghe, Sameera, et al.
Published: (2025)

ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training
by: Li, Minghao, et al.
Published: (2026)

Co-occurrence is not Factual Association in Language Models
by: Zhang, Xiao, et al.
Published: (2024)

Pre‐ and postpollination barriers between a widespread and a narrow endemic species with one‐by‐one stamen movement
by: Wen‐Qian Xiang, et al.
Published: (2025)