Saved in:
| Main Authors: | Wang, Junjie, Zhou, Pan, Dong, Yiming, Li, Huan, Li, Jia, Zhou, Xun, Lao, Qicheng, Fang, Cong, Lin, Zhouchen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24218 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm
by: Li, Huan, et al.
Published: (2024)
by: Li, Huan, et al.
Published: (2024)
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-Sided and Two-Sided Preconditioning
by: Li, Huan, et al.
Published: (2026)
by: Li, Huan, et al.
Published: (2026)
On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm
by: Li, Huan, et al.
Published: (2025)
by: Li, Huan, et al.
Published: (2025)
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
by: Xie, Xingyu, et al.
Published: (2022)
by: Xie, Xingyu, et al.
Published: (2022)
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
Convergence Rate Analysis of LION
by: Dong, Yiming, et al.
Published: (2024)
by: Dong, Yiming, et al.
Published: (2024)
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
by: Wu, Zhoutong, et al.
Published: (2025)
by: Wu, Zhoutong, et al.
Published: (2025)
Cross-Modal Knowledge Distillation for Speech Large Language Models
by: Wang, Enzhi, et al.
Published: (2025)
by: Wang, Enzhi, et al.
Published: (2025)
MLAE: Masked LoRA Experts for Visual Parameter-Efficient Fine-Tuning
by: Wang, Junjie, et al.
Published: (2024)
by: Wang, Junjie, et al.
Published: (2024)
FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models
by: Liu, Junkang, et al.
Published: (2025)
by: Liu, Junkang, et al.
Published: (2025)
One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection
by: Li, Yiyue, et al.
Published: (2025)
by: Li, Yiyue, et al.
Published: (2025)
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
by: Zhao, Shiwan, et al.
Published: (2026)
by: Zhao, Shiwan, et al.
Published: (2026)
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure
by: Yang, Haotong, et al.
Published: (2023)
by: Yang, Haotong, et al.
Published: (2023)
Training Data for Large Language Model
by: Ju, Yiming, et al.
Published: (2024)
by: Ju, Yiming, et al.
Published: (2024)
Simple Convergence Proof of Adam From a Sign-like Descent Perspective
by: Peng, Hanyang, et al.
Published: (2025)
by: Peng, Hanyang, et al.
Published: (2025)
scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation
by: Li, Cong, et al.
Published: (2024)
by: Li, Cong, et al.
Published: (2024)
IBNorm: Information-Bottleneck Inspired Normalization for Representation Learning
by: Zou, Xiandong, et al.
Published: (2025)
by: Zou, Xiandong, et al.
Published: (2025)
Confounder-Aware Medical Data Selection for Fine-Tuning Pretrained Vision Models
by: Ji, Anyang, et al.
Published: (2025)
by: Ji, Anyang, et al.
Published: (2025)
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
by: Zhang, Jun, et al.
Published: (2025)
by: Zhang, Jun, et al.
Published: (2025)
Convergence Rate Analysis of SOAP with Arbitrary Orthogonal Projection Matrices
by: Li, Huan, et al.
Published: (2026)
by: Li, Huan, et al.
Published: (2026)
Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization
by: Li, Huan, et al.
Published: (2021)
by: Li, Huan, et al.
Published: (2021)
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training
by: Wang, Yisen, et al.
Published: (2025)
by: Wang, Yisen, et al.
Published: (2025)
Connector-S: A Survey of Connectors in Multi-modal Large Language Models
by: Zhu, Xun, et al.
Published: (2025)
by: Zhu, Xun, et al.
Published: (2025)
FlowX: Towards Explainable Graph Neural Networks via Message Flows
by: Gui, Shurui, et al.
Published: (2022)
by: Gui, Shurui, et al.
Published: (2022)
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
by: Chen, Yanxi, et al.
Published: (2023)
by: Chen, Yanxi, et al.
Published: (2023)
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models
by: Chen, Jie, et al.
Published: (2024)
by: Chen, Jie, et al.
Published: (2024)
Stepsize anything: A unified learning rate schedule for budgeted-iteration training
by: Tang, Anda, et al.
Published: (2025)
by: Tang, Anda, et al.
Published: (2025)
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
by: Zhuo, Zhijian, et al.
Published: (2025)
by: Zhuo, Zhijian, et al.
Published: (2025)
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
by: Chen, Zixin, et al.
Published: (2025)
by: Chen, Zixin, et al.
Published: (2025)
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
scReader: Prompting Large Language Models to Interpret scRNA-seq Data
by: Li, Cong, et al.
Published: (2024)
by: Li, Cong, et al.
Published: (2024)
Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models
by: Shi, Yubin, et al.
Published: (2024)
by: Shi, Yubin, et al.
Published: (2024)
Number Cookbook: Number Understanding of Language Models and How to Improve It
by: Yang, Haotong, et al.
Published: (2024)
by: Yang, Haotong, et al.
Published: (2024)
4-bit Shampoo for Memory-Efficient Network Training
by: Wang, Sike, et al.
Published: (2024)
by: Wang, Sike, et al.
Published: (2024)
STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models
by: Zhou, Jingjing, et al.
Published: (2026)
by: Zhou, Jingjing, et al.
Published: (2026)
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
by: Zhuo, Zhijian, et al.
Published: (2024)
by: Zhuo, Zhijian, et al.
Published: (2024)
RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees
by: Ma, Junjie, et al.
Published: (2025)
by: Ma, Junjie, et al.
Published: (2025)
Large Language Model for Participatory Urban Planning
by: Zhou, Zhilun, et al.
Published: (2024)
by: Zhou, Zhilun, et al.
Published: (2024)
Black-Box On-Policy Distillation of Large Language Models
by: Ye, Tianzhu, et al.
Published: (2025)
by: Ye, Tianzhu, et al.
Published: (2025)
Similar Items
-
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm
by: Li, Huan, et al.
Published: (2024) -
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-Sided and Two-Sided Preconditioning
by: Li, Huan, et al.
Published: (2026) -
On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm
by: Li, Huan, et al.
Published: (2025) -
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
by: Xie, Xingyu, et al.
Published: (2022) -
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
by: Chen, Yang, et al.
Published: (2024)