:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gozeten, Halil Alperen, Ildiz, M. Emrullah, Zhang, Xuechen, Soltanolkotabi, Mahdi, Mondelli, Marco, Oymak, Samet
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.11842
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
by: Ildiz, M. Emrullah, et al.
Published: (2024)

Evolutionary Multi-Task Optimization for LLM-Guided Program Discovery
by: Gozeten, Halil Alperen, et al.
Published: (2026)

Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026)

Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025)

TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
by: Taga, Ege Onur, et al.
Published: (2025)

Attention with Trained Embeddings Provably Selects Important Tokens
by: Wu, Diyuan, et al.
Published: (2025)

Retrieval Augmented Time Series Forecasting
by: Tire, Kutay, et al.
Published: (2024)

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
by: Ildiz, M. Emrullah, et al.
Published: (2024)

Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)

On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)

Latent Chain-of-Thought Improves Structured-Data Transformers
by: Dudley, Carson, et al.
Published: (2026)

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)

On the Generalization Properties of Selective State-Space Models for Filtering Tasks for Unknown Systems
by: Tang, Alex, et al.
Published: (2026)

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)

Selective Attention: Enhancing Transformer through Principled Context Control
by: Zhang, Xuechen, et al.
Published: (2024)

SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
by: Zhang, Xuechen, et al.
Published: (2025)

BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
by: Zhang, Xuechen, et al.
Published: (2025)

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
by: Zhang, Xuechen, et al.
Published: (2025)

VSPO: Vector-Steered Policy Optimization for Behavioral Control
by: Zhang, Xuechen, et al.
Published: (2026)

When and How Unlabeled Data Provably Improve In-Context Learning
by: Li, Yingcong, et al.
Published: (2025)

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
by: Li, Yingcong, et al.
Published: (2024)

Learning to Bet for Horizon-Aware Anytime-Valid Testing
by: Taga, Ege Onur, et al.
Published: (2026)

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
by: Jacot, Arthur, et al.
Published: (2024)

Covariance-Aware Transformers for Quadratic Programming and Decision Making
by: Tire, Kutay, et al.
Published: (2026)

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
by: Goel, Gautam, et al.
Published: (2026)

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
by: Collins, Liam, et al.
Published: (2023)

Plug-and-Play Transformer Modules for Test-Time Adaptation
by: Chang, Xiangyu, et al.
Published: (2024)

ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)

Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024)

Can Transformers Learn Optimal Filtering for Unknown Systems?
by: Balim, Haldun, et al.
Published: (2023)

Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models
by: Dreveton, Maximilien, et al.
Published: (2024)

Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
by: Kögler, Kevin, et al.
Published: (2024)

Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)

Learning to Recall with Transformers Beyond Orthogonal Embeddings
by: Vural, Nuri Mert, et al.
Published: (2026)

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
by: Li, Yingcong, et al.
Published: (2025)

Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards
by: Heckel, Reinhard, et al.
Published: (2026)

Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning
by: Banayeeanzade, Amin, et al.
Published: (2024)

Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models
by: Demir, Samet, et al.
Published: (2025)

In-Context Learning Under Regime Change
by: Dudley, Carson, et al.
Published: (2026)

Improved Convergence of Score-Based Diffusion Models via Prediction-Correction
by: Pedrotti, Francesco, et al.
Published: (2023)