Saved in:
| Main Authors: | Lara, Luis, Milios, Aristides, Luo, Zhi Hao, Sharma, Aditya, Luo, Ge Ya, Beckham, Christopher, Golemo, Florian, Pal, Christopher |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.14117 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design
by: Luo, Zhi Hao, et al.
Published: (2024)
by: Luo, Zhi Hao, et al.
Published: (2024)
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes
by: Gosselin, Anthony, et al.
Published: (2025)
by: Gosselin, Anthony, et al.
Published: (2025)
LLMs can learn self-restraint through iterative self-reflection
by: Piché, Alexandre, et al.
Published: (2024)
by: Piché, Alexandre, et al.
Published: (2024)
Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion
by: Luo, Ge Ya, et al.
Published: (2024)
by: Luo, Ge Ya, et al.
Published: (2024)
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
by: Luo, Ge Ya, et al.
Published: (2024)
by: Luo, Ge Ya, et al.
Published: (2024)
Exploring validation metrics for offline model-based optimisation with diffusion models
by: Beckham, Christopher, et al.
Published: (2022)
by: Beckham, Christopher, et al.
Published: (2022)
CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning
by: Rowe, Luke, et al.
Published: (2024)
by: Rowe, Luke, et al.
Published: (2024)
Robust Guided Diffusion for Offline Black-Box Optimization
by: Chen, Can Sam, et al.
Published: (2024)
by: Chen, Can Sam, et al.
Published: (2024)
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
by: Sharma, Aditya, et al.
Published: (2025)
by: Sharma, Aditya, et al.
Published: (2025)
Indignados en España e Indecisos en Polonia. La inspiración española en el contexto polaco y el fracaso de la protesta en el país de “Solidarnosć”
by: Karolina Golemo
Published: (2014)
by: Karolina Golemo
Published: (2014)
ROSA: Random Subspace Adaptation for Efficient Fine-Tuning
by: Hameed, Marawan Gamal Abdel, et al.
Published: (2024)
by: Hameed, Marawan Gamal Abdel, et al.
Published: (2024)
Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation
by: Bo, Zitong, et al.
Published: (2025)
by: Bo, Zitong, et al.
Published: (2025)
Meeting Legal Challenges. The School Leader's Library: Leading for Learning Series.
by: Beckham, Joseph
Published: (1996)
by: Beckham, Joseph
Published: (1996)
Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation
by: Zhou, Jiang, et al.
Published: (2026)
by: Zhou, Jiang, et al.
Published: (2026)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
by: Liu, Shuze Daniel, et al.
Published: (2026)
by: Liu, Shuze Daniel, et al.
Published: (2026)
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
by: Li, Hongyu, et al.
Published: (2025)
by: Li, Hongyu, et al.
Published: (2025)
PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards
by: Ghimire, Mukesh, et al.
Published: (2026)
by: Ghimire, Mukesh, et al.
Published: (2026)
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
by: Sharma, Aditya, et al.
Published: (2024)
by: Sharma, Aditya, et al.
Published: (2024)
Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs
by: Alam, Md Tanvirul, et al.
Published: (2026)
by: Alam, Md Tanvirul, et al.
Published: (2026)
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
by: Bensal, Shelly, et al.
Published: (2025)
by: Bensal, Shelly, et al.
Published: (2025)
Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards
by: Zhang, Xin, et al.
Published: (2026)
by: Zhang, Xin, et al.
Published: (2026)
On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL
by: Belcamino, Valerio, et al.
Published: (2026)
by: Belcamino, Valerio, et al.
Published: (2026)
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)
by: Wang, Peisong, et al.
Published: (2025)
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
by: Jiang, Yuxin, et al.
Published: (2026)
by: Jiang, Yuxin, et al.
Published: (2026)
HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation
by: Klimenko, Nikita, et al.
Published: (2026)
by: Klimenko, Nikita, et al.
Published: (2026)
Outcome Rewards Do Not Guarantee Verifiable or Causally Important Reasoning
by: Yu, Qinan, et al.
Published: (2026)
by: Yu, Qinan, et al.
Published: (2026)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
by: Wen, Xuexiang, et al.
Published: (2026)
by: Wen, Xuexiang, et al.
Published: (2026)
Text-to-Layout: A Generative Workflow for Drafting Architectural Floor Plans Using LLMs
by: Duggempudi, Jayakrishna, et al.
Published: (2025)
by: Duggempudi, Jayakrishna, et al.
Published: (2025)
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
by: Liu, Shudong, et al.
Published: (2025)
by: Liu, Shudong, et al.
Published: (2025)
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
by: Liang, Hao, et al.
Published: (2022)
by: Liang, Hao, et al.
Published: (2022)
Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)
by: Cho, Dongkyu Derek, et al.
Published: (2025)
Conocimiento termodinámico sociales, y sus límites ecológicos
by: Camarena, Beckham, et al.
Published: (2025)
by: Camarena, Beckham, et al.
Published: (2025)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Lessons from Training Grounded LLMs with Verifiable Rewards
by: Sim, Shang Hong, et al.
Published: (2025)
by: Sim, Shang Hong, et al.
Published: (2025)
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)
by: Liu, Yule, et al.
Published: (2025)
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards
by: Tang, Xinyu, et al.
Published: (2025)
by: Tang, Xinyu, et al.
Published: (2025)
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)
by: Gunjal, Anisha, et al.
Published: (2025)
Similar Items
-
DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design
by: Luo, Zhi Hao, et al.
Published: (2024) -
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes
by: Gosselin, Anthony, et al.
Published: (2025) -
LLMs can learn self-restraint through iterative self-reflection
by: Piché, Alexandre, et al.
Published: (2024) -
Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion
by: Luo, Ge Ya, et al.
Published: (2024) -
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
by: Luo, Ge Ya, et al.
Published: (2024)