Saved in:
| Main Authors: | Gozeten, Halil Alperen, Ildiz, M. Emrullah, Zhang, Xuechen, Soltanolkotabi, Mahdi, Mondelli, Marco, Oymak, Samet |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.11842 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
by: Ildiz, M. Emrullah, et al.
Published: (2024)
by: Ildiz, M. Emrullah, et al.
Published: (2024)
Evolutionary Multi-Task Optimization for LLM-Guided Program Discovery
by: Gozeten, Halil Alperen, et al.
Published: (2026)
by: Gozeten, Halil Alperen, et al.
Published: (2026)
Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026)
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026)
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025)
by: Gozeten, Halil Alperen, et al.
Published: (2025)
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
by: Taga, Ege Onur, et al.
Published: (2025)
by: Taga, Ege Onur, et al.
Published: (2025)
Attention with Trained Embeddings Provably Selects Important Tokens
by: Wu, Diyuan, et al.
Published: (2025)
by: Wu, Diyuan, et al.
Published: (2025)
Retrieval Augmented Time Series Forecasting
by: Tire, Kutay, et al.
Published: (2024)
by: Tire, Kutay, et al.
Published: (2024)
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
by: Ildiz, M. Emrullah, et al.
Published: (2024)
by: Ildiz, M. Emrullah, et al.
Published: (2024)
Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)
by: Li, Yingcong, et al.
Published: (2024)
On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)
by: Li, Mingchen, et al.
Published: (2024)
Latent Chain-of-Thought Improves Structured-Data Transformers
by: Dudley, Carson, et al.
Published: (2026)
by: Dudley, Carson, et al.
Published: (2026)
Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)
by: Kovačević, Filip, et al.
Published: (2026)
On the Generalization Properties of Selective State-Space Models for Filtering Tasks for Unknown Systems
by: Tang, Alex, et al.
Published: (2026)
by: Tang, Alex, et al.
Published: (2026)
Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
Selective Attention: Enhancing Transformer through Principled Context Control
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
by: Zhang, Xuechen, et al.
Published: (2025)
by: Zhang, Xuechen, et al.
Published: (2025)
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
by: Zhang, Xuechen, et al.
Published: (2025)
by: Zhang, Xuechen, et al.
Published: (2025)
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
by: Zhang, Xuechen, et al.
Published: (2025)
by: Zhang, Xuechen, et al.
Published: (2025)
VSPO: Vector-Steered Policy Optimization for Behavioral Control
by: Zhang, Xuechen, et al.
Published: (2026)
by: Zhang, Xuechen, et al.
Published: (2026)
When and How Unlabeled Data Provably Improve In-Context Learning
by: Li, Yingcong, et al.
Published: (2025)
by: Li, Yingcong, et al.
Published: (2025)
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
by: Li, Yingcong, et al.
Published: (2024)
by: Li, Yingcong, et al.
Published: (2024)
Learning to Bet for Horizon-Aware Anytime-Valid Testing
by: Taga, Ege Onur, et al.
Published: (2026)
by: Taga, Ege Onur, et al.
Published: (2026)
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
by: Jacot, Arthur, et al.
Published: (2024)
by: Jacot, Arthur, et al.
Published: (2024)
Covariance-Aware Transformers for Quadratic Programming and Decision Making
by: Tire, Kutay, et al.
Published: (2026)
by: Tire, Kutay, et al.
Published: (2026)
Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
by: Goel, Gautam, et al.
Published: (2026)
by: Goel, Gautam, et al.
Published: (2026)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
by: Collins, Liam, et al.
Published: (2023)
by: Collins, Liam, et al.
Published: (2023)
Plug-and-Play Transformer Modules for Test-Time Adaptation
by: Chang, Xiangyu, et al.
Published: (2024)
by: Chang, Xiangyu, et al.
Published: (2024)
ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
Can Transformers Learn Optimal Filtering for Unknown Systems?
by: Balim, Haldun, et al.
Published: (2023)
by: Balim, Haldun, et al.
Published: (2023)
Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models
by: Dreveton, Maximilien, et al.
Published: (2024)
by: Dreveton, Maximilien, et al.
Published: (2024)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
by: Kögler, Kevin, et al.
Published: (2024)
by: Kögler, Kevin, et al.
Published: (2024)
Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
Learning to Recall with Transformers Beyond Orthogonal Embeddings
by: Vural, Nuri Mert, et al.
Published: (2026)
by: Vural, Nuri Mert, et al.
Published: (2026)
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
by: Li, Yingcong, et al.
Published: (2025)
by: Li, Yingcong, et al.
Published: (2025)
Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards
by: Heckel, Reinhard, et al.
Published: (2026)
by: Heckel, Reinhard, et al.
Published: (2026)
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning
by: Banayeeanzade, Amin, et al.
Published: (2024)
by: Banayeeanzade, Amin, et al.
Published: (2024)
Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models
by: Demir, Samet, et al.
Published: (2025)
by: Demir, Samet, et al.
Published: (2025)
In-Context Learning Under Regime Change
by: Dudley, Carson, et al.
Published: (2026)
by: Dudley, Carson, et al.
Published: (2026)
Improved Convergence of Score-Based Diffusion Models via Prediction-Correction
by: Pedrotti, Francesco, et al.
Published: (2023)
by: Pedrotti, Francesco, et al.
Published: (2023)
Similar Items
-
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
by: Ildiz, M. Emrullah, et al.
Published: (2024) -
Evolutionary Multi-Task Optimization for LLM-Guided Program Discovery
by: Gozeten, Halil Alperen, et al.
Published: (2026) -
Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026) -
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025) -
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
by: Taga, Ege Onur, et al.
Published: (2025)