Saved in:
| Main Authors: | Zhang, Xuechen, Huang, Zijian, Li, Yingcong, Ni, Chenshun, Chen, Jiasi, Oymak, Samet |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.17211 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
by: Zhang, Xuechen, et al.
Published: (2025)
by: Zhang, Xuechen, et al.
Published: (2025)
VSPO: Vector-Steered Policy Optimization for Behavioral Control
by: Zhang, Xuechen, et al.
Published: (2026)
by: Zhang, Xuechen, et al.
Published: (2026)
Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
by: Zhang, Xuechen, et al.
Published: (2025)
by: Zhang, Xuechen, et al.
Published: (2025)
Selective Attention: Enhancing Transformer through Principled Context Control
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
On the Power of Convolution Augmented Transformer
by: Li, Mingchen, et al.
Published: (2024)
by: Li, Mingchen, et al.
Published: (2024)
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
by: Li, Yingcong, et al.
Published: (2024)
by: Li, Yingcong, et al.
Published: (2024)
Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
by: Ildiz, M. Emrullah, et al.
Published: (2024)
by: Ildiz, M. Emrullah, et al.
Published: (2024)
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
by: Gozeten, Halil Alperen, et al.
Published: (2025)
by: Gozeten, Halil Alperen, et al.
Published: (2025)
Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)
by: Li, Yingcong, et al.
Published: (2024)
Latent Chain-of-Thought Improves Structured-Data Transformers
by: Dudley, Carson, et al.
Published: (2026)
by: Dudley, Carson, et al.
Published: (2026)
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
by: Li, Yingcong, et al.
Published: (2025)
by: Li, Yingcong, et al.
Published: (2025)
Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning
by: Zhu, Taojie, et al.
Published: (2026)
by: Zhu, Taojie, et al.
Published: (2026)
Test-Time Training Provably Improves Transformers as In-context Learners
by: Gozeten, Halil Alperen, et al.
Published: (2025)
by: Gozeten, Halil Alperen, et al.
Published: (2025)
Evolutionary Multi-Task Optimization for LLM-Guided Program Discovery
by: Gozeten, Halil Alperen, et al.
Published: (2026)
by: Gozeten, Halil Alperen, et al.
Published: (2026)
When and How Unlabeled Data Provably Improve In-Context Learning
by: Li, Yingcong, et al.
Published: (2025)
by: Li, Yingcong, et al.
Published: (2025)
L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery
by: Tsai, Yi-Zhen, et al.
Published: (2025)
by: Tsai, Yi-Zhen, et al.
Published: (2025)
Learning to Bet for Horizon-Aware Anytime-Valid Testing
by: Taga, Ege Onur, et al.
Published: (2026)
by: Taga, Ege Onur, et al.
Published: (2026)
Covariance-Aware Transformers for Quadratic Programming and Decision Making
by: Tire, Kutay, et al.
Published: (2026)
by: Tire, Kutay, et al.
Published: (2026)
RL Fine-Tuning Heals OOD Forgetting in SFT
by: Jin, Hangzhan, et al.
Published: (2025)
by: Jin, Hangzhan, et al.
Published: (2025)
Attention with Trained Embeddings Provably Selects Important Tokens
by: Wu, Diyuan, et al.
Published: (2025)
by: Wu, Diyuan, et al.
Published: (2025)
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
by: Taga, Ege Onur, et al.
Published: (2025)
by: Taga, Ege Onur, et al.
Published: (2025)
In-Context Learning Under Regime Change
by: Dudley, Carson, et al.
Published: (2026)
by: Dudley, Carson, et al.
Published: (2026)
Can Transformers Learn Optimal Filtering for Unknown Systems?
by: Balim, Haldun, et al.
Published: (2023)
by: Balim, Haldun, et al.
Published: (2023)
SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
by: Limozin, Alexis, et al.
Published: (2026)
by: Limozin, Alexis, et al.
Published: (2026)
Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
QuRL: Efficient Reinforcement Learning with Quantized Rollout
by: Li, Yuhang, et al.
Published: (2026)
by: Li, Yuhang, et al.
Published: (2026)
Retrieval Augmented Time Series Forecasting
by: Tire, Kutay, et al.
Published: (2024)
by: Tire, Kutay, et al.
Published: (2024)
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
by: Liu, Zihan, et al.
Published: (2025)
by: Liu, Zihan, et al.
Published: (2025)
EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)
by: Bi, Jinhe, et al.
Published: (2026)
Learning to Correct: Calibrated Reinforcement Learning for Multi-Attempt Chain-of-Thought
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026)
by: Ildiz, Muhammed Emrullah, et al.
Published: (2026)
Heddle: A Distributed Orchestration System for Agentic RL Rollout
by: Zhang, Zili, et al.
Published: (2026)
by: Zhang, Zili, et al.
Published: (2026)
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
by: Qiu, Haibo, et al.
Published: (2025)
by: Qiu, Haibo, et al.
Published: (2025)
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
by: Ildiz, M. Emrullah, et al.
Published: (2024)
by: Ildiz, M. Emrullah, et al.
Published: (2024)
RL makes MLLMs see better than SFT
by: Song, Junha, et al.
Published: (2025)
by: Song, Junha, et al.
Published: (2025)
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
by: Kang, Feiyang, et al.
Published: (2025)
by: Kang, Feiyang, et al.
Published: (2025)
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)
by: Liu, Bingshuai, et al.
Published: (2025)
Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT
by: Wang, Jiacheng, et al.
Published: (2026)
by: Wang, Jiacheng, et al.
Published: (2026)
Similar Items
-
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
by: Zhang, Xuechen, et al.
Published: (2025) -
VSPO: Vector-Steered Policy Optimization for Behavioral Control
by: Zhang, Xuechen, et al.
Published: (2026) -
Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024) -
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
by: Zhang, Xuechen, et al.
Published: (2024) -
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
by: Zhang, Xuechen, et al.
Published: (2025)