Saved in:
| Main Authors: | Yan, Tingkai, Wen, Haodong, Li, Binghui, Luo, Kairong, Chen, Wenguang, Lyu, Kaifeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.13421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
by: Luo, Kairong, et al.
Published: (2025)
by: Luo, Kairong, et al.
Published: (2025)
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025)
by: Luo, Kairong, et al.
Published: (2025)
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
by: Li, Xinghan, et al.
Published: (2025)
by: Li, Xinghan, et al.
Published: (2025)
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024)
by: Li, Binghui, et al.
Published: (2024)
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
by: Li, Binghui, et al.
Published: (2024)
by: Li, Binghui, et al.
Published: (2024)
PCMind-2.1-Kaiyuan-2B Technical Report
by: Luo, Kairong, et al.
Published: (2025)
by: Luo, Kairong, et al.
Published: (2025)
Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
by: Li, Binghui, et al.
Published: (2025)
by: Li, Binghui, et al.
Published: (2025)
Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
by: Luo, Luo, et al.
Published: (2025)
by: Luo, Luo, et al.
Published: (2025)
Fully First-Order Algorithms for Online Bilevel Optimization
by: Jia, Tingkai, et al.
Published: (2026)
by: Jia, Tingkai, et al.
Published: (2026)
Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners
by: Wang, Yuxin, et al.
Published: (2025)
by: Wang, Yuxin, et al.
Published: (2025)
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
by: Malladi, Sadhika, et al.
Published: (2022)
by: Malladi, Sadhika, et al.
Published: (2022)
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
Measure-Theoretic Anti-Causal Representation Learning
by: Behnam, Arman, et al.
Published: (2025)
by: Behnam, Arman, et al.
Published: (2025)
Transformers Handle Endogeneity in In-Context Linear Regression
by: Liang, Haodong, et al.
Published: (2024)
by: Liang, Haodong, et al.
Published: (2024)
Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks
by: Noorbakhsh, Sayedeh Leila, et al.
Published: (2024)
by: Noorbakhsh, Sayedeh Leila, et al.
Published: (2024)
On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
by: Li, Binghui, et al.
Published: (2023)
by: Li, Binghui, et al.
Published: (2023)
Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
by: Gu, Xinran, et al.
Published: (2025)
by: Gu, Xinran, et al.
Published: (2025)
Theoretical limitations of multi-layer Transformer
by: Chen, Lijie, et al.
Published: (2024)
by: Chen, Lijie, et al.
Published: (2024)
Achieving Better Local Regret Bound for Online Non-Convex Bilevel Optimization
by: Jia, Tingkai, et al.
Published: (2026)
by: Jia, Tingkai, et al.
Published: (2026)
A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning
by: Ziemann, Ingvar
Published: (2024)
by: Ziemann, Ingvar
Published: (2024)
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
by: Huang, Jing, et al.
Published: (2026)
by: Huang, Jing, et al.
Published: (2026)
Bi-Directional Multi-Scale Graph Dataset Condensation via Information Bottleneck
by: Fu, Xingcheng, et al.
Published: (2024)
by: Fu, Xingcheng, et al.
Published: (2024)
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks
by: Lyu, Zhonghao, et al.
Published: (2025)
by: Lyu, Zhonghao, et al.
Published: (2025)
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
by: Lyu, Kaifeng, et al.
Published: (2023)
by: Lyu, Kaifeng, et al.
Published: (2023)
FedTilt: Towards Multi-Level Fairness-Preserving and Robust Federated Learning
by: Zhang, Binghui, et al.
Published: (2025)
by: Zhang, Binghui, et al.
Published: (2025)
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
by: Chen, Xingwu, et al.
Published: (2025)
by: Chen, Xingwu, et al.
Published: (2025)
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
by: Wang, Jiachen T., et al.
Published: (2025)
by: Wang, Jiachen T., et al.
Published: (2025)
Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations
by: Luo, Haozheng, et al.
Published: (2026)
by: Luo, Haozheng, et al.
Published: (2026)
Shift is Good: Mismatched Data Mixing Improves Test Performance
by: Medvedev, Marko, et al.
Published: (2025)
by: Medvedev, Marko, et al.
Published: (2025)
Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
by: Li, Hongkang, et al.
Published: (2025)
by: Li, Hongkang, et al.
Published: (2025)
ElastiFormer: Learned Redundancy Reduction in Transformer via Self-Distillation
by: Liu, Junzhang, et al.
Published: (2024)
by: Liu, Junzhang, et al.
Published: (2024)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
by: Qi, Shuren, et al.
Published: (2024)
by: Qi, Shuren, et al.
Published: (2024)
Scaling Laws for Precision in High-Dimensional Linear Regression
by: Zhang, Dechen, et al.
Published: (2026)
by: Zhang, Dechen, et al.
Published: (2026)
Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks
by: Olmin, Amanda, et al.
Published: (2024)
by: Olmin, Amanda, et al.
Published: (2024)
High-Dimensional Private Linear Regression with Optimal Rates
by: Bombari, Simone, et al.
Published: (2025)
by: Bombari, Simone, et al.
Published: (2025)
Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression
by: Luo, Zhankun, et al.
Published: (2025)
by: Luo, Zhankun, et al.
Published: (2025)
Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing
by: Sun, Rongyi, et al.
Published: (2026)
by: Sun, Rongyi, et al.
Published: (2026)
The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression
by: Dar, Yehuda, et al.
Published: (2021)
by: Dar, Yehuda, et al.
Published: (2021)
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
by: Luo, Yuwei, et al.
Published: (2023)
by: Luo, Yuwei, et al.
Published: (2023)
Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression
by: Ding, Shihong, et al.
Published: (2025)
by: Ding, Shihong, et al.
Published: (2025)
Similar Items
-
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
by: Luo, Kairong, et al.
Published: (2025) -
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025) -
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
by: Li, Xinghan, et al.
Published: (2025) -
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024) -
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
by: Li, Binghui, et al.
Published: (2024)