Saved in:
| Main Authors: | Yan, Jun, Huang, Weiquan, Zuo, Jiankai, Mo, Yujian, Fang, Xi, Wu, Chengliang, Wei, Zeming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.26929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
by: Mo, Yichuan, et al.
Published: (2024)
by: Mo, Yichuan, et al.
Published: (2024)
When and Why Grouping Attention Heads Accelerates Muon Optimization
by: Zhang, Hongtao, et al.
Published: (2026)
by: Zhang, Hongtao, et al.
Published: (2026)
AdaMuon: Adaptive Muon Optimizer
by: Si, Chongjie, et al.
Published: (2025)
by: Si, Chongjie, et al.
Published: (2025)
Identifying and Understanding Cross-Class Features in Adversarial Training
by: Wei, Zeming, et al.
Published: (2025)
by: Wei, Zeming, et al.
Published: (2025)
DiM-TS: Bridge the Gap between Selective State Space Models and Time Series for Generative Modeling
by: Yao, Zihao, et al.
Published: (2025)
by: Yao, Zihao, et al.
Published: (2025)
Phases of Muon: When Muon Eclipses SignSGD
by: Paquette, Elliot, et al.
Published: (2026)
by: Paquette, Elliot, et al.
Published: (2026)
A Theoretical Understanding of Self-Correction through In-context Alignment
by: Wang, Yifei, et al.
Published: (2024)
by: Wang, Yifei, et al.
Published: (2024)
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
LiMuon: Light and Fast Muon Optimizer for Large Models
by: Huang, Feihu, et al.
Published: (2025)
by: Huang, Feihu, et al.
Published: (2025)
When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach
by: Zhang, Zhihan, et al.
Published: (2025)
by: Zhang, Zhihan, et al.
Published: (2025)
Breaking Symmetry When Training Transformers
by: Zuo, Chunsheng, et al.
Published: (2024)
by: Zuo, Chunsheng, et al.
Published: (2024)
When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
by: Wang, Junxiong, et al.
Published: (2026)
by: Wang, Junxiong, et al.
Published: (2026)
When Invariant Representation Learning Meets Label Shift: Insufficiency and Theoretical Insights
by: Luo, You-Wei, et al.
Published: (2024)
by: Luo, You-Wei, et al.
Published: (2024)
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
by: Zhang, Yihao, et al.
Published: (2024)
by: Zhang, Yihao, et al.
Published: (2024)
NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
by: Dolatabadi, Hadi Mohaghegh, et al.
Published: (2026)
by: Dolatabadi, Hadi Mohaghegh, et al.
Published: (2026)
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
by: Huang, Feihu, et al.
Published: (2026)
by: Huang, Feihu, et al.
Published: (2026)
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
by: Fu, Shaopeng, et al.
Published: (2025)
by: Fu, Shaopeng, et al.
Published: (2025)
Enhancing Adversarial Training via Reweighting Optimization Trajectory
by: Huang, Tianjin, et al.
Published: (2023)
by: Huang, Tianjin, et al.
Published: (2023)
Muon is Scalable for LLM Training
by: Liu, Jingyuan, et al.
Published: (2025)
by: Liu, Jingyuan, et al.
Published: (2025)
Democratic Training Against Universal Adversarial Perturbations
by: Sun, Bing, et al.
Published: (2025)
by: Sun, Bing, et al.
Published: (2025)
Decoding Large Language Diffusion Models with Foreseeing Movement
by: Mo, Yichuan, et al.
Published: (2025)
by: Mo, Yichuan, et al.
Published: (2025)
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy
by: Yang, Zonghan, et al.
Published: (2024)
by: Yang, Zonghan, et al.
Published: (2024)
Lions and Muons: Optimization via Stochastic Frank-Wolfe
by: Sfyraki, Maria-Eleni, et al.
Published: (2025)
by: Sfyraki, Maria-Eleni, et al.
Published: (2025)
Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective
by: Xing, Yue, et al.
Published: (2024)
by: Xing, Yue, et al.
Published: (2024)
When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values
by: Muller, Christophe, et al.
Published: (2025)
by: Muller, Christophe, et al.
Published: (2025)
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026)
by: Li, Jiacheng, et al.
Published: (2026)
LaMsS: When Large Language Models Meet Self-Skepticism
by: Wu, Yetao, et al.
Published: (2024)
by: Wu, Yetao, et al.
Published: (2024)
On the Convergence Analysis of Muon
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates
by: Zhang, Minxin, et al.
Published: (2025)
by: Zhang, Minxin, et al.
Published: (2025)
When and Why Adversarial Training Improves PINNs: A Neural Tangent Kernel Perspective
by: Cao, Yuan-dong, et al.
Published: (2026)
by: Cao, Yuan-dong, et al.
Published: (2026)
Active Learning For Contextual Linear Optimization: A Margin-Based Approach
by: Liu, Mo, et al.
Published: (2023)
by: Liu, Mo, et al.
Published: (2023)
SignMuon: Communication-Efficient Distributed Muon Optimization
by: Mishra, Neel, et al.
Published: (2026)
by: Mishra, Neel, et al.
Published: (2026)
Effective Quantization of Muon Optimizer States
by: Gupta, Aman, et al.
Published: (2025)
by: Gupta, Aman, et al.
Published: (2025)
MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization
by: Su, Yupeng, et al.
Published: (2026)
by: Su, Yupeng, et al.
Published: (2026)
Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
The Newton-Muon Optimizer
by: Du, Zhehang, et al.
Published: (2026)
by: Du, Zhehang, et al.
Published: (2026)
A Dynamic Stiefel Graph Neural Network for Efficient Spatio-Temporal Time Series Forecasting
by: Zheng, Jiankai, et al.
Published: (2025)
by: Zheng, Jiankai, et al.
Published: (2025)
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability
by: Zheng, Chenyu, et al.
Published: (2024)
by: Zheng, Chenyu, et al.
Published: (2024)
MuCon: Clipped Muon Updates for LLM Training
by: Yi, Albert
Published: (2026)
by: Yi, Albert
Published: (2026)
Information Theoretic Adversarial Training of Large Language Models
by: Zhang, Yiwei, et al.
Published: (2026)
by: Zhang, Yiwei, et al.
Published: (2026)
Similar Items
-
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
by: Mo, Yichuan, et al.
Published: (2024) -
When and Why Grouping Attention Heads Accelerates Muon Optimization
by: Zhang, Hongtao, et al.
Published: (2026) -
AdaMuon: Adaptive Muon Optimizer
by: Si, Chongjie, et al.
Published: (2025) -
Identifying and Understanding Cross-Class Features in Adversarial Training
by: Wei, Zeming, et al.
Published: (2025) -
DiM-TS: Bridge the Gap between Selective State Space Models and Time Series for Generative Modeling
by: Yao, Zihao, et al.
Published: (2025)