Saved in:
| Main Author: | Song, Meng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.15693 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
by: Zhang, Zelin, et al.
Published: (2026)
by: Zhang, Zelin, et al.
Published: (2026)
FADE: Why Bad Descriptions Happen to Good Features
by: Puri, Bruno, et al.
Published: (2025)
by: Puri, Bruno, et al.
Published: (2025)
Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL
by: Suau, Miguel, et al.
Published: (2023)
by: Suau, Miguel, et al.
Published: (2023)
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)
by: Wu, Xiefeng, et al.
Published: (2025)
Why LLMs Are Bad at Synthetic Table Generation (and what to do about it)
by: Xu, Shengzhe, et al.
Published: (2024)
by: Xu, Shengzhe, et al.
Published: (2024)
VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study
by: Zhang, Zhicheng, et al.
Published: (2026)
by: Zhang, Zhicheng, et al.
Published: (2026)
Mimicking Better by Matching the Approximate Action Distribution
by: Ramos, João A. Cândido, et al.
Published: (2023)
by: Ramos, João A. Cândido, et al.
Published: (2023)
Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
by: Lu, Aojun, et al.
Published: (2026)
by: Lu, Aojun, et al.
Published: (2026)
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
by: Xu, Charles, et al.
Published: (2026)
by: Xu, Charles, et al.
Published: (2026)
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
by: Driess, Danny, et al.
Published: (2025)
by: Driess, Danny, et al.
Published: (2025)
When Good Equations Get Bad Scores: Improving Symbolic Regression Through Better Parameter Optimization
by: Wang, Boxiao, et al.
Published: (2026)
by: Wang, Boxiao, et al.
Published: (2026)
The Infinite-Dimensional Nature of Spectroscopy and Why Models Succeed, Fail, and Mislead
by: Michelucci, Umberto, et al.
Published: (2026)
by: Michelucci, Umberto, et al.
Published: (2026)
Good Allocations from Bad Estimates
by: Casacuberta, Sílvia, et al.
Published: (2026)
by: Casacuberta, Sílvia, et al.
Published: (2026)
ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
by: Hou, Yupeng, et al.
Published: (2025)
by: Hou, Yupeng, et al.
Published: (2025)
Scalable Offline Model-Based RL with Action Chunks
by: Park, Kwanyoung, et al.
Published: (2025)
by: Park, Kwanyoung, et al.
Published: (2025)
Stochastic Gradient Succeeds for Bandits
by: Mei, Jincheng, et al.
Published: (2024)
by: Mei, Jincheng, et al.
Published: (2024)
Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
by: Singh, Anima, et al.
Published: (2023)
by: Singh, Anima, et al.
Published: (2023)
Behavior Generation with Latent Actions
by: Lee, Seungjae, et al.
Published: (2024)
by: Lee, Seungjae, et al.
Published: (2024)
Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters
by: Jannach, Dietmar, et al.
Published: (2024)
by: Jannach, Dietmar, et al.
Published: (2024)
$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
by: Chen, Kang, et al.
Published: (2025)
by: Chen, Kang, et al.
Published: (2025)
From Actions to Words: Towards Abstractive-Textual Policy Summarization in RL
by: Admoni, Sahar, et al.
Published: (2025)
by: Admoni, Sahar, et al.
Published: (2025)
SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning
by: Li, Xuyang, et al.
Published: (2025)
by: Li, Xuyang, et al.
Published: (2025)
Benchmarking the Generality of Vision-Language-Action Models
by: Guruprasad, Pranav, et al.
Published: (2025)
by: Guruprasad, Pranav, et al.
Published: (2025)
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
by: Li, Jingyang, et al.
Published: (2024)
by: Li, Jingyang, et al.
Published: (2024)
Learning to Generate All Feasible Actions
by: Theile, Mirco, et al.
Published: (2023)
by: Theile, Mirco, et al.
Published: (2023)
Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
by: Pignatelli, Eduardo, et al.
Published: (2024)
by: Pignatelli, Eduardo, et al.
Published: (2024)
Vertical Federated Learning in Practice: The Good, the Bad, and the Ugly
by: Wu, Zhaomin, et al.
Published: (2025)
by: Wu, Zhaomin, et al.
Published: (2025)
When Bad Data Leads to Good Models
by: Li, Kenneth, et al.
Published: (2025)
by: Li, Kenneth, et al.
Published: (2025)
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
by: Gupta, Isha, et al.
Published: (2025)
by: Gupta, Isha, et al.
Published: (2025)
$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search
by: Li, Yinan, et al.
Published: (2026)
by: Li, Yinan, et al.
Published: (2026)
Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
by: Pan, Chaofan, et al.
Published: (2025)
by: Pan, Chaofan, et al.
Published: (2025)
HIQL: Offline Goal-Conditioned RL with Latent States as Actions
by: Park, Seohong, et al.
Published: (2023)
by: Park, Seohong, et al.
Published: (2023)
Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs
by: Chen, Jialin, et al.
Published: (2026)
by: Chen, Jialin, et al.
Published: (2026)
Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
by: Bolland, Adrien, et al.
Published: (2024)
by: Bolland, Adrien, et al.
Published: (2024)
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025)
by: Sahoo, Subramanyam
Published: (2025)
Sequence-Aware Inline Measurement Attribution for Good-Bad Wafer Diagnosis
by: Miyaguchi, Kohei, et al.
Published: (2025)
by: Miyaguchi, Kohei, et al.
Published: (2025)
Agent Performing Autonomous Stock Trading under Good and Bad Situations
by: Luo, Yunfei, et al.
Published: (2023)
by: Luo, Yunfei, et al.
Published: (2023)
Action-Inspired Generative Models
by: A., Eshwar R., et al.
Published: (2026)
by: A., Eshwar R., et al.
Published: (2026)
Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
by: Zhang, Shenao, et al.
Published: (2025)
by: Zhang, Shenao, et al.
Published: (2025)
Similar Items
-
When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
by: Zhang, Zelin, et al.
Published: (2026) -
FADE: Why Bad Descriptions Happen to Good Features
by: Puri, Bruno, et al.
Published: (2025) -
Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL
by: Suau, Miguel, et al.
Published: (2023) -
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025) -
Why LLMs Are Bad at Synthetic Table Generation (and what to do about it)
by: Xu, Shengzhe, et al.
Published: (2024)