Saved in:
| Main Authors: | Zhang, Mengyu, Ding, Siyu, Yin, Weichong, Sun, Yu, Wu, Hua |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.02463 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
by: Yu, Linhao, et al.
Published: (2026)
by: Yu, Linhao, et al.
Published: (2026)
RLPR: Extrapolating RLVR to General Domains without Verifiers
by: Yu, Tianyu, et al.
Published: (2025)
by: Yu, Tianyu, et al.
Published: (2025)
IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
by: Li, Yuhan, et al.
Published: (2026)
by: Li, Yuhan, et al.
Published: (2026)
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
by: Liu, Yesheng, et al.
Published: (2025)
by: Liu, Yesheng, et al.
Published: (2025)
Before the Model Learns the Bug:Fuzzing RLVR Verifiers
by: Ray, Jaideep
Published: (2026)
by: Ray, Jaideep
Published: (2026)
DéjàQ: Open-Ended Evolution of Diverse, Learnable and Verifiable Problems
by: Röpke, Willem, et al.
Published: (2026)
by: Röpke, Willem, et al.
Published: (2026)
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
by: Xie, Tianbao, et al.
Published: (2024)
by: Xie, Tianbao, et al.
Published: (2024)
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026)
by: Guo, Weiyang, et al.
Published: (2026)
JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks
by: Lin, Lanbo, et al.
Published: (2026)
by: Lin, Lanbo, et al.
Published: (2026)
Aletheia: What Makes RLVR For Code Verifiers Tick?
by: Venkatkrishna, Vatsal, et al.
Published: (2026)
by: Venkatkrishna, Vatsal, et al.
Published: (2026)
Open-Ended Task Discovery via Bayesian Optimization
by: Adachi, Masaki, et al.
Published: (2026)
by: Adachi, Masaki, et al.
Published: (2026)
Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation
by: Zhai, Shaopeng, et al.
Published: (2023)
by: Zhai, Shaopeng, et al.
Published: (2023)
On Creativity and Open-Endedness
by: Soros, L. B., et al.
Published: (2024)
by: Soros, L. B., et al.
Published: (2024)
LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking
by: Helff, Lukas, et al.
Published: (2026)
by: Helff, Lukas, et al.
Published: (2026)
Tru-POMDP: Task Planning Under Uncertainty via Tree of Hypotheses and Open-Ended POMDPs
by: Tang, Wenjing, et al.
Published: (2025)
by: Tang, Wenjing, et al.
Published: (2025)
Orthogonal Finetuning for Direct Preference Optimization
by: Yang, Chenxu, et al.
Published: (2024)
by: Yang, Chenxu, et al.
Published: (2024)
RM-PoT: Reformulating Mathematical Problems and Solving via Program of Thoughts
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero
by: Yin, Shangjian, et al.
Published: (2026)
by: Yin, Shangjian, et al.
Published: (2026)
Weights-Rotated Preference Optimization for Large Language Models
by: Yang, Chenxu, et al.
Published: (2025)
by: Yang, Chenxu, et al.
Published: (2025)
RLVR-World: Training World Models with Reinforcement Learning
by: Wu, Jialong, et al.
Published: (2025)
by: Wu, Jialong, et al.
Published: (2025)
PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment
by: Oh, Jihwan, et al.
Published: (2026)
by: Oh, Jihwan, et al.
Published: (2026)
Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks
by: Li, Haotian, et al.
Published: (2026)
by: Li, Haotian, et al.
Published: (2026)
Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification
by: Guo, Yuxuan, et al.
Published: (2024)
by: Guo, Yuxuan, et al.
Published: (2024)
Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing
by: Weng, Zhaotian, et al.
Published: (2026)
by: Weng, Zhaotian, et al.
Published: (2026)
Differentiating Choices via Commonality for Multiple-Choice Question Answering
by: Deng, Wenqing, et al.
Published: (2024)
by: Deng, Wenqing, et al.
Published: (2024)
Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task
by: Choi, Jungmin, et al.
Published: (2026)
by: Choi, Jungmin, et al.
Published: (2026)
Embodied World Models Emerge from Navigational Task in Open-Ended Environments
by: Jin, Li, et al.
Published: (2025)
by: Jin, Li, et al.
Published: (2025)
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training
by: Wang, Pengkai, et al.
Published: (2025)
by: Wang, Pengkai, et al.
Published: (2025)
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
by: Zhang, Qiang, et al.
Published: (2026)
by: Zhang, Qiang, et al.
Published: (2026)
Pessimistic Verification for Open Ended Math Questions
by: Huang, Yanxing, et al.
Published: (2025)
by: Huang, Yanxing, et al.
Published: (2025)
GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
by: Yang, Saelyne, et al.
Published: (2026)
by: Yang, Saelyne, et al.
Published: (2026)
EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA
by: Zeng, Yunsheng, et al.
Published: (2026)
by: Zeng, Yunsheng, et al.
Published: (2026)
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
by: Ding, Li, et al.
Published: (2023)
by: Ding, Li, et al.
Published: (2023)
Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models
by: Shen, Minghe, et al.
Published: (2025)
by: Shen, Minghe, et al.
Published: (2025)
Reverse-Engineered Reasoning for Open-Ended Generation
by: Wang, Haozhe, et al.
Published: (2025)
by: Wang, Haozhe, et al.
Published: (2025)
The Multiple Ticket Hypothesis: Random Sparse Subnetworks Suffice for RLVR
by: Adewuyi, Israel, et al.
Published: (2026)
by: Adewuyi, Israel, et al.
Published: (2026)
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
by: Li, Yu, et al.
Published: (2024)
by: Li, Yu, et al.
Published: (2024)
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds
by: Cui, Christopher Z., et al.
Published: (2024)
by: Cui, Christopher Z., et al.
Published: (2024)
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
by: Matthews, Michael, et al.
Published: (2024)
by: Matthews, Michael, et al.
Published: (2024)
Similar Items
-
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
by: Yu, Linhao, et al.
Published: (2026) -
RLPR: Extrapolating RLVR to General Domains without Verifiers
by: Yu, Tianyu, et al.
Published: (2025) -
IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
by: Li, Yuhan, et al.
Published: (2026) -
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
by: Liu, Yesheng, et al.
Published: (2025) -
Before the Model Learns the Bug:Fuzzing RLVR Verifiers
by: Ray, Jaideep
Published: (2026)