Saved in:
| Main Authors: | Li, Zichong, Liu, Liming, Liang, Chen, Chen, Weizhu, Zhao, Tuo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.05491 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)
by: Li, Zichong, et al.
Published: (2025)
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
by: Page, Saurabh, et al.
Published: (2025)
by: Page, Saurabh, et al.
Published: (2025)
Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)
by: Liu, Jingyuan, et al.
Published: (2025)
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
by: Liu, Liming, et al.
Published: (2025)
by: Liu, Liming, et al.
Published: (2025)
Delving into Muon and Beyond: Deep Analysis and Extensions
by: Qi, Xianbiao, et al.
Published: (2026)
by: Qi, Xianbiao, et al.
Published: (2026)
Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation
by: Li, Zichong, et al.
Published: (2026)
by: Li, Zichong, et al.
Published: (2026)
Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning
by: Zhang, Yechen, et al.
Published: (2026)
by: Zhang, Yechen, et al.
Published: (2026)
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026)
by: Li, Jiacheng, et al.
Published: (2026)
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
by: Ren, Liliang, et al.
Published: (2024)
by: Ren, Liliang, et al.
Published: (2024)
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
by: Li, Zichong, et al.
Published: (2025)
by: Li, Zichong, et al.
Published: (2025)
SignMuon: Communication-Efficient Distributed Muon Optimization
by: Mishra, Neel, et al.
Published: (2026)
by: Mishra, Neel, et al.
Published: (2026)
AtP*: An efficient and scalable method for localizing LLM behaviour to components
by: Kramár, János, et al.
Published: (2024)
by: Kramár, János, et al.
Published: (2024)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
by: Yin, Yueqin, et al.
Published: (2024)
by: Yin, Yueqin, et al.
Published: (2024)
LiMuon: Light and Fast Muon Optimizer for Large Models
by: Huang, Feihu, et al.
Published: (2025)
by: Huang, Feihu, et al.
Published: (2025)
AdaMuon: Adaptive Muon Optimizer
by: Si, Chongjie, et al.
Published: (2025)
by: Si, Chongjie, et al.
Published: (2025)
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)
by: Liang, Xiao, et al.
Published: (2025)
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
by: Zhong, Ming, et al.
Published: (2023)
by: Zhong, Ming, et al.
Published: (2023)
AMO: Adaptive Muon Orthogonalization
by: Zhuang, Xinlin, et al.
Published: (2026)
by: Zhuang, Xinlin, et al.
Published: (2026)
A Note on LoRA
by: Fomenko, Vlad, et al.
Published: (2024)
by: Fomenko, Vlad, et al.
Published: (2024)
MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization
by: Su, Yupeng, et al.
Published: (2026)
by: Su, Yupeng, et al.
Published: (2026)
Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
by: Liu, Ziyue, et al.
Published: (2026)
by: Liu, Ziyue, et al.
Published: (2026)
Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)
by: Chen, Lizhang, et al.
Published: (2025)
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
by: Huang, Feihu, et al.
Published: (2026)
by: Huang, Feihu, et al.
Published: (2026)
RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably
by: Du, Yufeng, et al.
Published: (2026)
by: Du, Yufeng, et al.
Published: (2026)
Fairness in Large Language Models in Three Hours
by: Viet, Thang Doan, et al.
Published: (2024)
by: Viet, Thang Doan, et al.
Published: (2024)
On the Convergence of Muon and Beyond
by: Chang, Da, et al.
Published: (2025)
by: Chang, Da, et al.
Published: (2025)
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
by: Lu, Binghang, et al.
Published: (2026)
by: Lu, Binghang, et al.
Published: (2026)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
by: Yin, Yueqin, et al.
Published: (2024)
by: Yin, Yueqin, et al.
Published: (2024)
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
by: Luo, Haipeng, et al.
Published: (2024)
by: Luo, Haipeng, et al.
Published: (2024)
Datasets for Fairness in Language Models: An In-Depth Survey
by: Zhang, Jiale, et al.
Published: (2025)
by: Zhang, Jiale, et al.
Published: (2025)
Fairness Definitions in Language Models Explained
by: Yin, Zhipeng, et al.
Published: (2024)
by: Yin, Zhipeng, et al.
Published: (2024)
Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)
by: Paquette, Elliot, et al.
Published: (2026)
NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
by: Dolatabadi, Hadi Mohaghegh, et al.
Published: (2026)
by: Dolatabadi, Hadi Mohaghegh, et al.
Published: (2026)
MuonBP: Faster Muon via Block-Periodic Orthogonalization
by: Khaled, Ahmed, et al.
Published: (2025)
by: Khaled, Ahmed, et al.
Published: (2025)
DynMuon: A Dynamic Spectral Shaping View of Muon
by: Wu, Fangzhou, et al.
Published: (2026)
by: Wu, Fangzhou, et al.
Published: (2026)
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)
by: Zhang, Qingru, et al.
Published: (2023)
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
by: Peng, Miao, et al.
Published: (2025)
by: Peng, Miao, et al.
Published: (2025)
HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation
by: Bengoetxea, Jaione, et al.
Published: (2024)
by: Bengoetxea, Jaione, et al.
Published: (2024)
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
by: Bogachev, Vladimir, et al.
Published: (2025)
by: Bogachev, Vladimir, et al.
Published: (2025)
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
by: Ren, Liliang, et al.
Published: (2025)
by: Ren, Liliang, et al.
Published: (2025)
Similar Items
-
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025) -
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
by: Page, Saurabh, et al.
Published: (2025) -
Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025) -
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
by: Liu, Liming, et al.
Published: (2025) -
Delving into Muon and Beyond: Deep Analysis and Extensions
by: Qi, Xianbiao, et al.
Published: (2026)