:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yan, Tingkai, Wen, Haodong, Li, Binghui, Luo, Kairong, Chen, Wenguang, Lyu, Kaifeng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.13421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
by: Luo, Kairong, et al.
Published: (2025)

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
by: Luo, Kairong, et al.
Published: (2025)

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
by: Li, Xinghan, et al.
Published: (2025)

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024)

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
by: Li, Binghui, et al.
Published: (2024)

PCMind-2.1-Kaiyuan-2B Technical Report
by: Luo, Kairong, et al.
Published: (2025)

Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
by: Li, Binghui, et al.
Published: (2025)

Decentralized Stochastic Nonconvex Optimization under the Relaxed Smoothness
by: Luo, Luo, et al.
Published: (2025)

Fully First-Order Algorithms for Online Bilevel Optimization
by: Jia, Tingkai, et al.
Published: (2026)

Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners
by: Wang, Yuxin, et al.
Published: (2025)

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
by: Malladi, Sadhika, et al.
Published: (2022)

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
by: Wen, Kaiyue, et al.
Published: (2024)

Measure-Theoretic Anti-Causal Representation Learning
by: Behnam, Arman, et al.
Published: (2025)

Transformers Handle Endogeneity in In-Context Linear Regression
by: Liang, Haodong, et al.
Published: (2024)

Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks
by: Noorbakhsh, Sayedeh Leila, et al.
Published: (2024)

On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
by: Li, Binghui, et al.
Published: (2023)

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
by: Gu, Xinran, et al.
Published: (2025)

Theoretical limitations of multi-layer Transformer
by: Chen, Lijie, et al.
Published: (2024)

Achieving Better Local Regret Bound for Online Non-Convex Bilevel Optimization
by: Jia, Tingkai, et al.
Published: (2026)

A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning
by: Ziemann, Ingvar
Published: (2024)

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
by: Huang, Jing, et al.
Published: (2026)

Bi-Directional Multi-Scale Graph Dataset Condensation via Information Bottleneck
by: Fu, Xingcheng, et al.
Published: (2024)

The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks
by: Lyu, Zhonghao, et al.
Published: (2025)

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
by: Lyu, Kaifeng, et al.
Published: (2023)

FedTilt: Towards Multi-Level Fairness-Preserving and Robust Federated Learning
by: Zhang, Binghui, et al.
Published: (2025)

Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
by: Chen, Xingwu, et al.
Published: (2025)

Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
by: Wang, Jiachen T., et al.
Published: (2025)

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations
by: Luo, Haozheng, et al.
Published: (2026)

Shift is Good: Mismatched Data Mixing Improves Test Performance
by: Medvedev, Marko, et al.
Published: (2025)

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis
by: Li, Hongkang, et al.
Published: (2025)

ElastiFormer: Learned Redundancy Reduction in Transformer via Self-Distillation
by: Liu, Junzhang, et al.
Published: (2024)

Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales
by: Qi, Shuren, et al.
Published: (2024)

Scaling Laws for Precision in High-Dimensional Linear Regression
by: Zhang, Dechen, et al.
Published: (2026)

Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks
by: Olmin, Amanda, et al.
Published: (2024)

High-Dimensional Private Linear Regression with Optimal Rates
by: Bombari, Simone, et al.
Published: (2025)

Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression
by: Luo, Zhankun, et al.
Published: (2025)

Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing
by: Sun, Rongyi, et al.
Published: (2026)

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression
by: Dar, Yehuda, et al.
Published: (2021)

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
by: Luo, Yuwei, et al.
Published: (2023)

Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression
by: Ding, Shihong, et al.
Published: (2025)