:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, LeCheng, Wang, Yuanshi, Shen, Haotian, Wang, Xujie
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.12801
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Development and Application of a Monte Carlo Tree Search Algorithm for Simulating Da Vinci Code Game Strategies
by: Zhang, Ye, et al.
Published: (2024)

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents
by: Mi, Qirui, et al.
Published: (2026)

XiCAD: Camera Activation Detection in the Da Vinci Xi User Interface
by: Jenke, Alexander C., et al.
Published: (2025)

Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement
by: Lian, Yongsheng
Published: (2025)

daVinci-LLM:Towards the Science of Pretraining
by: Qin, Yiwei, et al.
Published: (2026)

A Comparative Study on Code Generation with Transformers
by: Das, Namrata, et al.
Published: (2024)

daVinci-Dev: Agent-native Mid-training for Software Engineering
by: Zeng, Ji, et al.
Published: (2026)

A Survey on Code Generation with LLM-based Agents
by: Dong, Yihong, et al.
Published: (2025)

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
by: Mai, Xinji, et al.
Published: (2025)

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning
by: Mathur, Suyash Vardhan, et al.
Published: (2024)

A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants
by: Bayazıt, Barış, et al.
Published: (2025)

Integrating LTL Constraints into PPO for Safe Reinforcement Learning
by: Zhang, Maifang, et al.
Published: (2026)

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
by: Ye, Hanrong, et al.
Published: (2025)

Towards Repository-Level Program Verification with Large Language Models
by: Zhong, Si Cheng, et al.
Published: (2025)

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models
by: Zhu, Xiao, et al.
Published: (2026)

D2PPO: Diffusion Policy Policy Optimization with Dispersive Loss
by: Zou, Guowei, et al.
Published: (2025)

Stop Comparing LLM Agents Without Disclosing the Harness
by: Zhang, Yunbei, et al.
Published: (2026)

Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training
by: Gong, Xue, et al.
Published: (2026)

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?
by: Hua, Dongdong, et al.
Published: (2026)

Executable Code Actions Elicit Better LLM Agents
by: Wang, Xingyao, et al.
Published: (2024)

A Comparative Study of Text Retrieval Models on DaReCzech
by: Stetina, Jakub, et al.
Published: (2024)

DPO Meets PPO: Reinforced Token Optimization for RLHF
by: Zhong, Han, et al.
Published: (2024)

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
by: Wang, Tianyi, et al.
Published: (2026)

BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
by: Miao, Rui, et al.
Published: (2025)

MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
by: Wang, Wenhao, et al.
Published: (2025)

STELLA: Self-Evolving LLM Agent for Biomedical Research
by: Jin, Ruofan, et al.
Published: (2025)

AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents
by: Zhang, Zhixing, et al.
Published: (2026)

A Robust PPO-optimized Tabular Transformer Framework for Intrusion Detection in Industrial IoT Systems
by: She, Yuanya
Published: (2025)

SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
by: Yang, Min, et al.
Published: (2026)

LLM-based Multi-Agent Systems: Techniques and Business Perspectives
by: Yang, Yingxuan, et al.
Published: (2024)

OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling
by: Li, Zhong, et al.
Published: (2026)

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm
by: Wang, Hanyong, et al.
Published: (2026)

A Comparative Study of LLM-based ASR and Whisper in Low Resource and Code Switching Scenario
by: Song, Zheshu, et al.
Published: (2024)

PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
by: Wu, Yaozu, et al.
Published: (2025)

$τ^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment
by: Barres, Victor, et al.
Published: (2025)

AgenticTCAD: A LLM-based Multi-Agent Framework for Automated TCAD Code Generation and Device Optimization
by: Fan, Guangxi, et al.
Published: (2025)

DreamProver: Evolving Transferable Lemma Libraries via a Wake-Sleep Theorem-Proving Agent
by: Zhang, Youyuan, et al.
Published: (2026)

AgentInit: Initializing LLM-based Multi-Agent Systems via Diversity and Expertise Orchestration for Effective and Efficient Collaboration
by: Tian, Chunhao, et al.
Published: (2025)

Comparative Analysis of Large Language Models for Context-Aware Code Completion using SAFIM Framework
by: Zhang, Hang, et al.
Published: (2025)

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
by: Zhang, Hanrong, et al.
Published: (2024)