:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xuechen, Huang, Zijian, Li, Yingcong, Ni, Chenshun, Chen, Jiasi, Oymak, Samet
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.17211
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
by: Zhang, Xuechen, et al.
Published: (2025)

VSPO: Vector-Steered Policy Optimization for Behavioral Control
by: Zhang, Xuechen, et al.
Published: (2026)

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)

Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024)

SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
by: Zhang, Xuechen, et al.
Published: (2025)

Selective Attention: Enhancing Transformer through Principled Context Control
by: Zhang, Xuechen, et al.
Published: (2024)

On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
by: Li, Yingcong, et al.
Published: (2024)

Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
by: Ildiz, M. Emrullah, et al.
Published: (2024)

Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025)

Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)

Latent Chain-of-Thought Improves Structured-Data Transformers
by: Dudley, Carson, et al.
Published: (2026)

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
by: Li, Yingcong, et al.
Published: (2025)

Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning
by: Zhu, Taojie, et al.
Published: (2026)

Test-Time Training Provably Improves Transformers as In-context Learners
by: Gozeten, Halil Alperen, et al.
Published: (2025)

Evolutionary Multi-Task Optimization for LLM-Guided Program Discovery
by: Gozeten, Halil Alperen, et al.
Published: (2026)

When and How Unlabeled Data Provably Improve In-Context Learning
by: Li, Yingcong, et al.
Published: (2025)

L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery
by: Tsai, Yi-Zhen, et al.
Published: (2025)

Learning to Bet for Horizon-Aware Anytime-Valid Testing
by: Taga, Ege Onur, et al.
Published: (2026)

Covariance-Aware Transformers for Quadratic Programming and Decision Making
by: Tire, Kutay, et al.
Published: (2026)

RL Fine-Tuning Heals OOD Forgetting in SFT
by: Jin, Hangzhan, et al.
Published: (2025)

Attention with Trained Embeddings Provably Selects Important Tokens
by: Wu, Diyuan, et al.
Published: (2025)

TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
by: Taga, Ege Onur, et al.
Published: (2025)

In-Context Learning Under Regime Change
by: Dudley, Carson, et al.
Published: (2026)

Can Transformers Learn Optimal Filtering for Unknown Systems?
by: Balim, Haldun, et al.
Published: (2023)

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
by: Limozin, Alexis, et al.
Published: (2026)

Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

QuRL: Efficient Reinforcement Learning with Quantized Rollout
by: Li, Yuhang, et al.
Published: (2026)

Retrieval Augmented Time Series Forecasting
by: Tire, Kutay, et al.
Published: (2024)

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
by: Liu, Zihan, et al.
Published: (2025)

EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)

Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026)

Heddle: A Distributed Orchestration System for Agentic RL Rollout
by: Zhang, Zili, et al.
Published: (2026)

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
by: Qiu, Haibo, et al.
Published: (2025)

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
by: Ildiz, M. Emrullah, et al.
Published: (2024)

RL makes MLLMs see better than SFT
by: Song, Junha, et al.
Published: (2025)

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
by: Kang, Feiyang, et al.
Published: (2025)

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)

Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT
by: Wang, Jiacheng, et al.
Published: (2026)