:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Zheng-An, Luo, Tao
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.06954
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages
by: Chen, Zheng-An, et al.
Published: (2024)

Disentangling Feature Structure: A Mathematically Provable Two-Stage Training Dynamics in Transformers
by: Gong, Zixuan, et al.
Published: (2025)

A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
by: Gong, Chen, et al.
Published: (2025)

Dynamic Graph Condensation
by: Chen, Dong, et al.
Published: (2025)

Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers
by: Cirrincione, Giansalvo
Published: (2026)

The Inlet Rank Collapse in Implicit Neural Representations: Diagnosis and Unified Remedy
by: Zheng, Jianqiao, et al.
Published: (2026)

Two-Stage Aggregation with Dynamic Local Attention for Irregular Time Series
by: Chen, Xingyu, et al.
Published: (2023)

From Collapse to Improvement: Statistical Perspectives on the Evolutionary Dynamics of Iterative Training on Contaminated Sources
by: Bakshi, Soham, et al.
Published: (2026)

Two-Stage Feature Generation with Transformer and Reinforcement Learning
by: Gao, Wanfu, et al.
Published: (2025)

DynamicLight: Two-Stage Dynamic Traffic Signal Timing
by: Zhang, Liang, et al.
Published: (2022)

A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
by: Fu, Shi, et al.
Published: (2025)

Mind the Gap: a Spectral Analysis of Rank Collapse and Signal Propagation in Attention Layers
by: Saada, Thiziri Nait, et al.
Published: (2024)

From $O(mn)$ to $O(r^2)$: Two-Sided Low-Rank Communication for Adam in Distributed Training with Memory Efficiency
by: Dang, Sizhe, et al.
Published: (2026)

Generative Early Stage Ranking
by: Hong, Juhee, et al.
Published: (2025)

The Persistence of Neural Collapse Despite Low-Rank Bias
by: Garrod, Connall, et al.
Published: (2024)

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
by: Xu, Donglai, et al.
Published: (2025)

SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models
by: Tang, Anke, et al.
Published: (2024)

Phase Diagram of Initial Condensation for Two-layer Neural Networks
by: Chen, Zhengan, et al.
Published: (2023)

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
by: Qiu, Shikai, et al.
Published: (2025)

Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse
by: Baker, Bradley T., et al.
Published: (2024)

Lambda-Skip Connections: the architectural component that prevents Rank Collapse
by: Joseph, Federico Arangath, et al.
Published: (2024)

Preventing Representational Rank Collapse in MPNNs by Splitting the Computational Graph
by: Roth, Andreas, et al.
Published: (2024)

Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
by: Li, Hongkang, et al.
Published: (2024)

Two is better than one: A Collapse-free Multi-Reward RLIF Training Framework
by: Joarder, Shourov, et al.
Published: (2026)

Understanding the Staged Dynamics of Transformers in Learning Latent Structure
by: Saha, Rohan, et al.
Published: (2025)

A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
by: Shi, Lianghe, et al.
Published: (2025)

From Features to Transformers: Redefining Ranking for Scalable Impact
by: Borisyuk, Fedor, et al.
Published: (2025)

Training-free Heterogeneous Graph Condensation via Data Selection
by: Liang, Yuxuan, et al.
Published: (2024)

Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher
by: Prakash, Hari K, et al.
Published: (2026)

Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training
by: Shin, Hyuntak, et al.
Published: (2025)

A Survey on Graph Condensation
by: Xu, Hongjia, et al.
Published: (2024)

How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse
by: Seddik, Mohamed El Amine, et al.
Published: (2024)

GoQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization
by: Xiang, Maoyang, et al.
Published: (2026)

Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection
by: Fan, Ziqing, et al.
Published: (2025)

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions
by: YuHang, Wang, et al.
Published: (2025)

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
by: Wang, Jinpeng, et al.
Published: (2025)

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
by: Song, Kun, et al.
Published: (2024)

Rate of Model Collapse in Recursive Training
by: Suresh, Ananda Theertha, et al.
Published: (2024)

Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
by: Alman, Josh, et al.
Published: (2025)

Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning
by: Qiu, Haomiao, et al.
Published: (2025)