:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Zichong, Liu, Liming, Liang, Chen, Chen, Weizhu, Zhao, Tuo
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2510.05491
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)

MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
by: Page, Saurabh, et al.
Published: (2025)

Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)

COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
by: Liu, Liming, et al.
Published: (2025)

Delving into Muon and Beyond: Deep Analysis and Extensions
by: Qi, Xianbiao, et al.
Published: (2026)

Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation
by: Li, Zichong, et al.
Published: (2026)

Mousse: Rectifying the Geometry of Muon with Curvature-Aware Preconditioning
by: Zhang, Yechen, et al.
Published: (2026)

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026)

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
by: Ren, Liliang, et al.
Published: (2024)

LLMs Can Generate a Better Answer by Aggregating Their Own Responses
by: Li, Zichong, et al.
Published: (2025)

SignMuon: Communication-Efficient Distributed Muon Optimization
by: Mishra, Neel, et al.
Published: (2026)

AtP*: An efficient and scalable method for localizing LLM behaviour to components
by: Kramár, János, et al.
Published: (2024)

Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
by: Yin, Yueqin, et al.
Published: (2024)

LiMuon: Light and Fast Muon Optimizer for Large Models
by: Huang, Feihu, et al.
Published: (2025)

AdaMuon: Adaptive Muon Optimizer
by: Si, Chongjie, et al.
Published: (2025)

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
by: Zhong, Ming, et al.
Published: (2023)

AMO: Adaptive Muon Orthogonalization
by: Zhuang, Xinlin, et al.
Published: (2026)

A Note on LoRA
by: Fomenko, Vlad, et al.
Published: (2024)

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization
by: Su, Yupeng, et al.
Published: (2026)

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
by: Liu, Ziyue, et al.
Published: (2026)

Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
by: Huang, Feihu, et al.
Published: (2026)

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably
by: Du, Yufeng, et al.
Published: (2026)

Fairness in Large Language Models in Three Hours
by: Viet, Thang Doan, et al.
Published: (2024)

On the Convergence of Muon and Beyond
by: Chang, Da, et al.
Published: (2025)

Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
by: Lu, Binghang, et al.
Published: (2026)

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
by: Yin, Yueqin, et al.
Published: (2024)

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena
by: Luo, Haipeng, et al.
Published: (2024)

Datasets for Fairness in Language Models: An In-Depth Survey
by: Zhang, Jiale, et al.
Published: (2025)

Fairness Definitions in Language Models Explained
by: Yin, Zhipeng, et al.
Published: (2024)

Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)

NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
by: Dolatabadi, Hadi Mohaghegh, et al.
Published: (2026)

MuonBP: Faster Muon via Block-Periodic Orthogonalization
by: Khaled, Ahmed, et al.
Published: (2025)

DynMuon: A Dynamic Spectral Shaping View of Muon
by: Wu, Fangzhou, et al.
Published: (2026)

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
by: Peng, Miao, et al.
Published: (2025)

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation
by: Bengoetxea, Jaione, et al.
Published: (2024)

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
by: Bogachev, Vladimir, et al.
Published: (2025)

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
by: Ren, Liliang, et al.
Published: (2025)