Saved in:
| Main Author: | Delgrange, Florent |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23997 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep SPI: Safe Policy Improvement via World Models
by: Delgrange, Florent, et al.
Published: (2025)
by: Delgrange, Florent, et al.
Published: (2025)
E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing
by: Sadhuka, Shuvom, et al.
Published: (2025)
by: Sadhuka, Shuvom, et al.
Published: (2025)
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026)
by: Wang, Bowen, et al.
Published: (2026)
D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery
by: Moussa, Hanane Nour, et al.
Published: (2026)
by: Moussa, Hanane Nour, et al.
Published: (2026)
DeLF: Designing Learning Environments with Foundation Models
by: Afshar, Aida, et al.
Published: (2024)
by: Afshar, Aida, et al.
Published: (2024)
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
by: Wang, Zhaoyang, et al.
Published: (2026)
by: Wang, Zhaoyang, et al.
Published: (2026)
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
by: Wen, Yeming, et al.
Published: (2024)
by: Wen, Yeming, et al.
Published: (2024)
FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards
by: Han, Zhixin, et al.
Published: (2026)
by: Han, Zhixin, et al.
Published: (2026)
GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training
by: Cao, Yuan, et al.
Published: (2026)
by: Cao, Yuan, et al.
Published: (2026)
Domain-Adapted Small Language Models for Reliable Clinical Triage
by: Aljohani, Manar, et al.
Published: (2026)
by: Aljohani, Manar, et al.
Published: (2026)
Benchmarking World-Model Learning with Environment-Level Queries
by: Warrier, Archana, et al.
Published: (2025)
by: Warrier, Archana, et al.
Published: (2025)
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
by: Verma, Gaurav, et al.
Published: (2024)
by: Verma, Gaurav, et al.
Published: (2024)
The Evaluation Game: Beyond Static LLM Benchmarking
by: Wang, Paul, et al.
Published: (2026)
by: Wang, Paul, et al.
Published: (2026)
Adapting World Models with Latent-State Dynamics Residuals
by: Lanier, JB, et al.
Published: (2025)
by: Lanier, JB, et al.
Published: (2025)
Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control
by: An, Zhiyu, et al.
Published: (2024)
by: An, Zhiyu, et al.
Published: (2024)
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
by: Gunjal, Anisha, et al.
Published: (2025)
by: Gunjal, Anisha, et al.
Published: (2025)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
by: Zala, Abhay, et al.
Published: (2024)
by: Zala, Abhay, et al.
Published: (2024)
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
by: Rawles, Christopher, et al.
Published: (2024)
by: Rawles, Christopher, et al.
Published: (2024)
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)
by: Stojanovski, Zafir, et al.
Published: (2025)
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
by: Liu, Yuejiang, et al.
Published: (2026)
by: Liu, Yuejiang, et al.
Published: (2026)
Microeconomic Foundations of Multi-Agent Learning
by: Helou, Nassim
Published: (2026)
by: Helou, Nassim
Published: (2026)
DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents
by: Zhao, Jiahao, et al.
Published: (2026)
by: Zhao, Jiahao, et al.
Published: (2026)
Beyond Static Uncertainty: Modeling Temporal Uncertainty Dynamics for Probabilistic Time Series Forecasting
by: Wang, Yijun, et al.
Published: (2026)
by: Wang, Yijun, et al.
Published: (2026)
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
by: Huang, Jiawei, et al.
Published: (2026)
by: Huang, Jiawei, et al.
Published: (2026)
Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
by: de la Rosa, Raul, et al.
Published: (2026)
by: de la Rosa, Raul, et al.
Published: (2026)
Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale
by: Roth, Amit, et al.
Published: (2026)
by: Roth, Amit, et al.
Published: (2026)
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
by: Guan, Xinyan, et al.
Published: (2024)
by: Guan, Xinyan, et al.
Published: (2024)
Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)
by: Wang, Longwen, et al.
Published: (2026)
A Guide to Failure in Machine Learning: Reliability and Robustness from Foundations to Practice
by: Heim, Eric, et al.
Published: (2025)
by: Heim, Eric, et al.
Published: (2025)
CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments
by: Yu, Yi, et al.
Published: (2026)
by: Yu, Yi, et al.
Published: (2026)
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
by: Bhagat, Rishav, et al.
Published: (2024)
by: Bhagat, Rishav, et al.
Published: (2024)
Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values
by: Ramaswamy, Ashwin, et al.
Published: (2024)
by: Ramaswamy, Ashwin, et al.
Published: (2024)
An Interactive Agent Foundation Model
by: Durante, Zane, et al.
Published: (2024)
by: Durante, Zane, et al.
Published: (2024)
A Reliable Knowledge Processing Framework for Combustion Science using Foundation Models
by: Sharma, Vansh, et al.
Published: (2023)
by: Sharma, Vansh, et al.
Published: (2023)
Raising the Bar in Graph OOD Generalization: Invariant Learning Beyond Explicit Environment Modeling
by: Shen, Xu, et al.
Published: (2025)
by: Shen, Xu, et al.
Published: (2025)
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
by: Zhang, Zijing, et al.
Published: (2025)
by: Zhang, Zijing, et al.
Published: (2025)
PhoneWorld: Scaling Phone-Use Agent Environments
by: Tang, Zhengyang, et al.
Published: (2026)
by: Tang, Zhengyang, et al.
Published: (2026)
Diffusion Transformers as Open-World Spatiotemporal Foundation Models
by: Yuan, Yuan, et al.
Published: (2024)
by: Yuan, Yuan, et al.
Published: (2024)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)
by: Ding, Zihan, et al.
Published: (2024)
Similar Items
-
Deep SPI: Safe Policy Improvement via World Models
by: Delgrange, Florent, et al.
Published: (2025) -
E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing
by: Sadhuka, Shuvom, et al.
Published: (2025) -
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026) -
D3-Gym: Constructing Real-World Verifiable Environments for Data-Driven Discovery
by: Moussa, Hanane Nour, et al.
Published: (2026) -
DeLF: Designing Learning Environments with Foundation Models
by: Afshar, Aida, et al.
Published: (2024)