:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Pengcheng, Huang, Jerry, Yao, Jiarui, Pan, Rui, Niu, Peizhi, Liu, Yaowenqi, Wang, Ruida, Lu, Renhao, Guo, Yuwei, Zhang, Tong
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.13346
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
by: Yao, Jiarui, et al.
Published: (2025)

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
by: Wang, Ruida, et al.
Published: (2025)

GUIDE: Towards Scalable Advising for Research Ideas
by: Liu, Yaowenqi, et al.
Published: (2025)

PhysProver: Advancing Automatic Theorem Proving for Physics
by: Zhang, Hanning, et al.
Published: (2026)

Active Prompting with Chain-of-Thought for Large Language Models
by: Diao, Shizhe, et al.
Published: (2023)

Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
by: Kuang, Peng, et al.
Published: (2025)

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
by: Al-Tawaha, Ahmad, et al.
Published: (2026)

CodeVisionary: An Agent-based Framework for Evaluating Large Language Models in Code Generation
by: Wang, Xinchen, et al.
Published: (2025)

SPEX: Scaling Feature Interaction Explanations for LLMs
by: Kang, Justin Singh, et al.
Published: (2025)

AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
by: Jin, Jiarui, et al.
Published: (2026)

FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark
by: Kim, Heegyu, et al.
Published: (2024)

HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation
by: Zhang, Shijie, et al.
Published: (2025)

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
by: Wang, Ruida, et al.
Published: (2024)

From Correction to Mastery: Reinforced Distillation of Large Language Model Agents
by: Lyu, Yuanjie, et al.
Published: (2025)

Recursive Multi-Agent Systems
by: Yang, Xiyuan, et al.
Published: (2026)

ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs
by: Butler, Landon, et al.
Published: (2025)

CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
by: Li, Renhao, et al.
Published: (2024)

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
by: Wang, Zihan, et al.
Published: (2025)

Natural-Language Agent Harnesses
by: Pan, Linyue, et al.
Published: (2026)

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
by: Wang, Ruida, et al.
Published: (2025)

Code as Agent Harness
by: Ning, Xuying, et al.
Published: (2026)

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
by: Cheng, Zihao, et al.
Published: (2026)

Molly: Making Large Language Model Agents Solve Python Problem More Logically
by: Xiao, Rui, et al.
Published: (2024)

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
by: Xi, Zhiheng, et al.
Published: (2024)

Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation
by: Niu, Cheng, et al.
Published: (2024)

MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
by: Wang, Ruida, et al.
Published: (2025)

DeepAgent: A General Reasoning Agent with Scalable Toolsets
by: Li, Xiaoxi, et al.
Published: (2025)

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
by: Shen, Jingyan, et al.
Published: (2025)

KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
by: Pan, Haojie, et al.
Published: (2023)

Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
by: Chen, Jiaju, et al.
Published: (2025)

Agent-SafetyBench: Evaluating the Safety of LLM Agents
by: Zhang, Zhexin, et al.
Published: (2024)

Structured Agent Distillation for Large Language Model
by: Liu, Jun, et al.
Published: (2025)

SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent
by: Ji, Jiarui, et al.
Published: (2024)

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
by: Jiang, Haitao, et al.
Published: (2026)

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
by: Lu, Yuxuan, et al.
Published: (2025)

TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
by: Du, Yuwei, et al.
Published: (2024)

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)

Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback
by: Hu, Mengkang, et al.
Published: (2025)

AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
by: Hu, Yuyang, et al.
Published: (2026)

DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents
by: Yang, Shiyi, et al.
Published: (2025)