:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Song, Meng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2503.15693
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective
by: Zhang, Zelin, et al.
Published: (2026)

FADE: Why Bad Descriptions Happen to Good Features
by: Puri, Bruno, et al.
Published: (2025)

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL
by: Suau, Miguel, et al.
Published: (2023)

Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)

Why LLMs Are Bad at Synthetic Table Generation (and what to do about it)
by: Xu, Shengzhe, et al.
Published: (2024)

VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study
by: Zhang, Zhicheng, et al.
Published: (2026)

Mimicking Better by Matching the Approximate Action Distribution
by: Ramos, João A. Cândido, et al.
Published: (2023)

Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training
by: Lu, Aojun, et al.
Published: (2026)

RL Token: Bootstrapping Online RL with Vision-Language-Action Models
by: Xu, Charles, et al.
Published: (2026)

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
by: Driess, Danny, et al.
Published: (2025)

When Good Equations Get Bad Scores: Improving Symbolic Regression Through Better Parameter Optimization
by: Wang, Boxiao, et al.
Published: (2026)

The Infinite-Dimensional Nature of Spectroscopy and Why Models Succeed, Fail, and Mislead
by: Michelucci, Umberto, et al.
Published: (2026)

Good Allocations from Bad Estimates
by: Casacuberta, Sílvia, et al.
Published: (2026)

ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
by: Hou, Yupeng, et al.
Published: (2025)

Scalable Offline Model-Based RL with Action Chunks
by: Park, Kwanyoung, et al.
Published: (2025)

Stochastic Gradient Succeeds for Bandits
by: Mei, Jincheng, et al.
Published: (2024)

Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
by: Singh, Anima, et al.
Published: (2023)

Behavior Generation with Latent Actions
by: Lee, Seungjae, et al.
Published: (2024)

Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters
by: Jannach, Dietmar, et al.
Published: (2024)

$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
by: Chen, Kang, et al.
Published: (2025)

From Actions to Words: Towards Abstractive-Textual Policy Summarization in RL
by: Admoni, Sahar, et al.
Published: (2025)

SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning
by: Li, Xuyang, et al.
Published: (2025)

Benchmarking the Generality of Vision-Language-Action Models
by: Guruprasad, Pranav, et al.
Published: (2025)

Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
by: Li, Jingyang, et al.
Published: (2024)

Learning to Generate All Feasible Actions
by: Theile, Mirco, et al.
Published: (2023)

Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
by: Pignatelli, Eduardo, et al.
Published: (2024)

Vertical Federated Learning in Practice: The Good, the Bad, and the Ugly
by: Wu, Zhaomin, et al.
Published: (2025)

When Bad Data Leads to Good Models
by: Li, Kenneth, et al.
Published: (2025)

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
by: Gupta, Isha, et al.
Published: (2025)

$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree Search
by: Li, Yinan, et al.
Published: (2026)

Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
by: Pan, Chaofan, et al.
Published: (2025)

HIQL: Offline Goal-Conditioned RL with Latent States as Actions
by: Park, Seohong, et al.
Published: (2023)

Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs
by: Chen, Jialin, et al.
Published: (2026)

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures
by: Bolland, Adrien, et al.
Published: (2024)

The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025)

Sequence-Aware Inline Measurement Attribution for Good-Bad Wafer Diagnosis
by: Miyaguchi, Kohei, et al.
Published: (2025)

Agent Performing Autonomous Stock Trading under Good and Bad Situations
by: Luo, Yunfei, et al.
Published: (2023)

Action-Inspired Generative Models
by: A., Eshwar R., et al.
Published: (2026)

Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)

Learning to Reason as Action Abstractions with Scalable Mid-Training RL
by: Zhang, Shenao, et al.
Published: (2025)