Saved in:
| Main Authors: | Zhang, LeCheng, Wang, Yuanshi, Shen, Haotian, Wang, Xujie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.12801 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Development and Application of a Monte Carlo Tree Search Algorithm for Simulating Da Vinci Code Game Strategies
by: Zhang, Ye, et al.
Published: (2024)
by: Zhang, Ye, et al.
Published: (2024)
Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents
by: Mi, Qirui, et al.
Published: (2026)
by: Mi, Qirui, et al.
Published: (2026)
XiCAD: Camera Activation Detection in the Da Vinci Xi User Interface
by: Jenke, Alexander C., et al.
Published: (2025)
by: Jenke, Alexander C., et al.
Published: (2025)
Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement
by: Lian, Yongsheng
Published: (2025)
by: Lian, Yongsheng
Published: (2025)
daVinci-LLM:Towards the Science of Pretraining
by: Qin, Yiwei, et al.
Published: (2026)
by: Qin, Yiwei, et al.
Published: (2026)
A Comparative Study on Code Generation with Transformers
by: Das, Namrata, et al.
Published: (2024)
by: Das, Namrata, et al.
Published: (2024)
daVinci-Dev: Agent-native Mid-training for Software Engineering
by: Zeng, Ji, et al.
Published: (2026)
by: Zeng, Ji, et al.
Published: (2026)
A Survey on Code Generation with LLM-based Agents
by: Dong, Yihong, et al.
Published: (2025)
by: Dong, Yihong, et al.
Published: (2025)
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
by: Mai, Xinji, et al.
Published: (2025)
by: Mai, Xinji, et al.
Published: (2025)
DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
by: Mathur, Suyash Vardhan, et al.
Published: (2024)
A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants
by: Bayazıt, Barış, et al.
Published: (2025)
by: Bayazıt, Barış, et al.
Published: (2025)
Integrating LTL Constraints into PPO for Safe Reinforcement Learning
by: Zhang, Maifang, et al.
Published: (2026)
by: Zhang, Maifang, et al.
Published: (2026)
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
by: Ye, Hanrong, et al.
Published: (2025)
by: Ye, Hanrong, et al.
Published: (2025)
Towards Repository-Level Program Verification with Large Language Models
by: Zhong, Si Cheng, et al.
Published: (2025)
by: Zhong, Si Cheng, et al.
Published: (2025)
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models
by: Zhu, Xiao, et al.
Published: (2026)
by: Zhu, Xiao, et al.
Published: (2026)
D2PPO: Diffusion Policy Policy Optimization with Dispersive Loss
by: Zou, Guowei, et al.
Published: (2025)
by: Zou, Guowei, et al.
Published: (2025)
Stop Comparing LLM Agents Without Disclosing the Harness
by: Zhang, Yunbei, et al.
Published: (2026)
by: Zhang, Yunbei, et al.
Published: (2026)
Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training
by: Gong, Xue, et al.
Published: (2026)
by: Gong, Xue, et al.
Published: (2026)
PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?
by: Hua, Dongdong, et al.
Published: (2026)
by: Hua, Dongdong, et al.
Published: (2026)
Executable Code Actions Elicit Better LLM Agents
by: Wang, Xingyao, et al.
Published: (2024)
by: Wang, Xingyao, et al.
Published: (2024)
A Comparative Study of Text Retrieval Models on DaReCzech
by: Stetina, Jakub, et al.
Published: (2024)
by: Stetina, Jakub, et al.
Published: (2024)
DPO Meets PPO: Reinforced Token Optimization for RLHF
by: Zhong, Han, et al.
Published: (2024)
by: Zhong, Han, et al.
Published: (2024)
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
by: Wang, Tianyi, et al.
Published: (2026)
by: Wang, Tianyi, et al.
Published: (2026)
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
by: Miao, Rui, et al.
Published: (2025)
by: Miao, Rui, et al.
Published: (2025)
MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
by: Wang, Wenhao, et al.
Published: (2025)
by: Wang, Wenhao, et al.
Published: (2025)
STELLA: Self-Evolving LLM Agent for Biomedical Research
by: Jin, Ruofan, et al.
Published: (2025)
by: Jin, Ruofan, et al.
Published: (2025)
AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents
by: Zhang, Zhixing, et al.
Published: (2026)
by: Zhang, Zhixing, et al.
Published: (2026)
A Robust PPO-optimized Tabular Transformer Framework for Intrusion Detection in Industrial IoT Systems
by: She, Yuanya
Published: (2025)
by: She, Yuanya
Published: (2025)
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
by: Yang, Min, et al.
Published: (2026)
by: Yang, Min, et al.
Published: (2026)
LLM-based Multi-Agent Systems: Techniques and Business Perspectives
by: Yang, Yingxuan, et al.
Published: (2024)
by: Yang, Yingxuan, et al.
Published: (2024)
OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling
by: Li, Zhong, et al.
Published: (2026)
by: Li, Zhong, et al.
Published: (2026)
ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm
by: Wang, Hanyong, et al.
Published: (2026)
by: Wang, Hanyong, et al.
Published: (2026)
A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
by: Song, Zheshu, et al.
Published: (2024)
by: Song, Zheshu, et al.
Published: (2024)
PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
by: Wu, Yaozu, et al.
Published: (2025)
by: Wu, Yaozu, et al.
Published: (2025)
$τ^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment
by: Barres, Victor, et al.
Published: (2025)
by: Barres, Victor, et al.
Published: (2025)
AgenticTCAD: A LLM-based Multi-Agent Framework for Automated TCAD Code Generation and Device Optimization
by: Fan, Guangxi, et al.
Published: (2025)
by: Fan, Guangxi, et al.
Published: (2025)
DreamProver: Evolving Transferable Lemma Libraries via a Wake-Sleep Theorem-Proving Agent
by: Zhang, Youyuan, et al.
Published: (2026)
by: Zhang, Youyuan, et al.
Published: (2026)
AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration
by: Tian, Chunhao, et al.
Published: (2025)
by: Tian, Chunhao, et al.
Published: (2025)
Comparative Analysis of Large Language Models for Context-Aware Code Completion using SAFIM Framework
by: Zhang, Hang, et al.
Published: (2025)
by: Zhang, Hang, et al.
Published: (2025)
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
by: Zhang, Hanrong, et al.
Published: (2024)
by: Zhang, Hanrong, et al.
Published: (2024)
Similar Items
-
Development and Application of a Monte Carlo Tree Search Algorithm for Simulating Da Vinci Code Game Strategies
by: Zhang, Ye, et al.
Published: (2024) -
Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents
by: Mi, Qirui, et al.
Published: (2026) -
XiCAD: Camera Activation Detection in the Da Vinci Xi User Interface
by: Jenke, Alexander C., et al.
Published: (2025) -
Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement
by: Lian, Yongsheng
Published: (2025) -
daVinci-LLM:Towards the Science of Pretraining
by: Qin, Yiwei, et al.
Published: (2026)