Saved in:
| Main Authors: | Chen, Zheng-An, Luo, Tao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.06954 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages
by: Chen, Zheng-An, et al.
Published: (2024)
by: Chen, Zheng-An, et al.
Published: (2024)
Disentangling Feature Structure: A Mathematically Provable Two-Stage Training Dynamics in Transformers
by: Gong, Zixuan, et al.
Published: (2025)
by: Gong, Zixuan, et al.
Published: (2025)
A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
by: Gong, Chen, et al.
Published: (2025)
by: Gong, Chen, et al.
Published: (2025)
Dynamic Graph Condensation
by: Chen, Dong, et al.
Published: (2025)
by: Chen, Dong, et al.
Published: (2025)
Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers
by: Cirrincione, Giansalvo
Published: (2026)
by: Cirrincione, Giansalvo
Published: (2026)
The Inlet Rank Collapse in Implicit Neural Representations: Diagnosis and Unified Remedy
by: Zheng, Jianqiao, et al.
Published: (2026)
by: Zheng, Jianqiao, et al.
Published: (2026)
Two-Stage Aggregation with Dynamic Local Attention for Irregular Time Series
by: Chen, Xingyu, et al.
Published: (2023)
by: Chen, Xingyu, et al.
Published: (2023)
From Collapse to Improvement: Statistical Perspectives on the Evolutionary Dynamics of Iterative Training on Contaminated Sources
by: Bakshi, Soham, et al.
Published: (2026)
by: Bakshi, Soham, et al.
Published: (2026)
Two-Stage Feature Generation with Transformer and Reinforcement Learning
by: Gao, Wanfu, et al.
Published: (2025)
by: Gao, Wanfu, et al.
Published: (2025)
DynamicLight: Two-Stage Dynamic Traffic Signal Timing
by: Zhang, Liang, et al.
Published: (2022)
by: Zhang, Liang, et al.
Published: (2022)
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
by: Fu, Shi, et al.
Published: (2025)
by: Fu, Shi, et al.
Published: (2025)
Mind the Gap: a Spectral Analysis of Rank Collapse and Signal Propagation in Attention Layers
by: Saada, Thiziri Nait, et al.
Published: (2024)
by: Saada, Thiziri Nait, et al.
Published: (2024)
From $O(mn)$ to $O(r^2)$: Two-Sided Low-Rank Communication for Adam in Distributed Training with Memory Efficiency
by: Dang, Sizhe, et al.
Published: (2026)
by: Dang, Sizhe, et al.
Published: (2026)
Generative Early Stage Ranking
by: Hong, Juhee, et al.
Published: (2025)
by: Hong, Juhee, et al.
Published: (2025)
The Persistence of Neural Collapse Despite Low-Rank Bias
by: Garrod, Connall, et al.
Published: (2024)
by: Garrod, Connall, et al.
Published: (2024)
From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
by: Xu, Donglai, et al.
Published: (2025)
by: Xu, Donglai, et al.
Published: (2025)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
by: Tang, Anke, et al.
Published: (2024)
by: Tang, Anke, et al.
Published: (2024)
Phase Diagram of Initial Condensation for Two-layer Neural Networks
by: Chen, Zhengan, et al.
Published: (2023)
by: Chen, Zhengan, et al.
Published: (2023)
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
by: Qiu, Shikai, et al.
Published: (2025)
by: Qiu, Shikai, et al.
Published: (2025)
Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse
by: Baker, Bradley T., et al.
Published: (2024)
by: Baker, Bradley T., et al.
Published: (2024)
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
by: Joseph, Federico Arangath, et al.
Published: (2024)
by: Joseph, Federico Arangath, et al.
Published: (2024)
Preventing Representational Rank Collapse in MPNNs by Splitting the Computational Graph
by: Roth, Andreas, et al.
Published: (2024)
by: Roth, Andreas, et al.
Published: (2024)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
by: Li, Hongkang, et al.
Published: (2024)
by: Li, Hongkang, et al.
Published: (2024)
Two is better than one: A Collapse-free Multi-Reward RLIF Training Framework
by: Joarder, Shourov, et al.
Published: (2026)
by: Joarder, Shourov, et al.
Published: (2026)
Understanding the Staged Dynamics of Transformers in Learning Latent Structure
by: Saha, Rohan, et al.
Published: (2025)
by: Saha, Rohan, et al.
Published: (2025)
A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
by: Shi, Lianghe, et al.
Published: (2025)
by: Shi, Lianghe, et al.
Published: (2025)
From Features to Transformers: Redefining Ranking for Scalable Impact
by: Borisyuk, Fedor, et al.
Published: (2025)
by: Borisyuk, Fedor, et al.
Published: (2025)
Training-free Heterogeneous Graph Condensation via Data Selection
by: Liang, Yuxuan, et al.
Published: (2024)
by: Liang, Yuxuan, et al.
Published: (2024)
Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026)
by: Prakash, Hari K, et al.
Published: (2026)
Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training
by: Shin, Hyuntak, et al.
Published: (2025)
by: Shin, Hyuntak, et al.
Published: (2025)
A Survey on Graph Condensation
by: Xu, Hongjia, et al.
Published: (2024)
by: Xu, Hongjia, et al.
Published: (2024)
How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
by: Seddik, Mohamed El Amine, et al.
Published: (2024)
by: Seddik, Mohamed El Amine, et al.
Published: (2024)
GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
by: Xiang, Maoyang, et al.
Published: (2026)
by: Xiang, Maoyang, et al.
Published: (2026)
Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection
by: Fan, Ziqing, et al.
Published: (2025)
by: Fan, Ziqing, et al.
Published: (2025)
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions
by: YuHang, Wang, et al.
Published: (2025)
by: YuHang, Wang, et al.
Published: (2025)
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
by: Wang, Jinpeng, et al.
Published: (2025)
by: Wang, Jinpeng, et al.
Published: (2025)
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
by: Song, Kun, et al.
Published: (2024)
by: Song, Kun, et al.
Published: (2024)
Rate of Model Collapse in Recursive Training
by: Suresh, Ananda Theertha, et al.
Published: (2024)
by: Suresh, Ananda Theertha, et al.
Published: (2024)
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
by: Alman, Josh, et al.
Published: (2025)
by: Alman, Josh, et al.
Published: (2025)
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning
by: Qiu, Haomiao, et al.
Published: (2025)
by: Qiu, Haomiao, et al.
Published: (2025)
Similar Items
-
On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages
by: Chen, Zheng-An, et al.
Published: (2024) -
Disentangling Feature Structure: A Mathematically Provable Two-Stage Training Dynamics in Transformers
by: Gong, Zixuan, et al.
Published: (2025) -
A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
by: Gong, Chen, et al.
Published: (2025) -
Dynamic Graph Condensation
by: Chen, Dong, et al.
Published: (2025) -
Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers
by: Cirrincione, Giansalvo
Published: (2026)