Saved in:
| Main Authors: | Wang, Pengcheng, Huang, Jerry, Yao, Jiarui, Pan, Rui, Niu, Peizhi, Liu, Yaowenqi, Wang, Ruida, Lu, Renhao, Guo, Yuwei, Zhang, Tong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.13346 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
by: Yao, Jiarui, et al.
Published: (2025)
by: Yao, Jiarui, et al.
Published: (2025)
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
by: Wang, Ruida, et al.
Published: (2025)
by: Wang, Ruida, et al.
Published: (2025)
GUIDE: Towards Scalable Advising for Research Ideas
by: Liu, Yaowenqi, et al.
Published: (2025)
by: Liu, Yaowenqi, et al.
Published: (2025)
PhysProver: Advancing Automatic Theorem Proving for Physics
by: Zhang, Hanning, et al.
Published: (2026)
by: Zhang, Hanning, et al.
Published: (2026)
Active Prompting with Chain-of-Thought for Large Language Models
by: Diao, Shizhe, et al.
Published: (2023)
by: Diao, Shizhe, et al.
Published: (2023)
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
by: Kuang, Peng, et al.
Published: (2025)
by: Kuang, Peng, et al.
Published: (2025)
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
by: Al-Tawaha, Ahmad, et al.
Published: (2026)
by: Al-Tawaha, Ahmad, et al.
Published: (2026)
CodeVisionary: An Agent-based Framework for Evaluating Large Language Models in Code Generation
by: Wang, Xinchen, et al.
Published: (2025)
by: Wang, Xinchen, et al.
Published: (2025)
SPEX: Scaling Feature Interaction Explanations for LLMs
by: Kang, Justin Singh, et al.
Published: (2025)
by: Kang, Justin Singh, et al.
Published: (2025)
AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents
by: Jin, Jiarui, et al.
Published: (2026)
by: Jin, Jiarui, et al.
Published: (2026)
FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark
by: Kim, Heegyu, et al.
Published: (2024)
by: Kim, Heegyu, et al.
Published: (2024)
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation
by: Zhang, Shijie, et al.
Published: (2025)
by: Zhang, Shijie, et al.
Published: (2025)
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
by: Wang, Ruida, et al.
Published: (2024)
by: Wang, Ruida, et al.
Published: (2024)
From Correction to Mastery: Reinforced Distillation of Large Language Model Agents
by: Lyu, Yuanjie, et al.
Published: (2025)
by: Lyu, Yuanjie, et al.
Published: (2025)
Recursive Multi-Agent Systems
by: Yang, Xiyuan, et al.
Published: (2026)
by: Yang, Xiyuan, et al.
Published: (2026)
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs
by: Butler, Landon, et al.
Published: (2025)
by: Butler, Landon, et al.
Published: (2025)
CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
by: Li, Renhao, et al.
Published: (2024)
by: Li, Renhao, et al.
Published: (2024)
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
Natural-Language Agent Harnesses
by: Pan, Linyue, et al.
Published: (2026)
by: Pan, Linyue, et al.
Published: (2026)
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
by: Wang, Ruida, et al.
Published: (2025)
by: Wang, Ruida, et al.
Published: (2025)
Code as Agent Harness
by: Ning, Xuying, et al.
Published: (2026)
by: Ning, Xuying, et al.
Published: (2026)
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills
by: Cheng, Zihao, et al.
Published: (2026)
by: Cheng, Zihao, et al.
Published: (2026)
Molly: Making Large Language Model Agents Solve Python Problem More Logically
by: Xiao, Rui, et al.
Published: (2024)
by: Xiao, Rui, et al.
Published: (2024)
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation
by: Niu, Cheng, et al.
Published: (2024)
by: Niu, Cheng, et al.
Published: (2024)
MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
by: Wang, Ruida, et al.
Published: (2025)
by: Wang, Ruida, et al.
Published: (2025)
DeepAgent: A General Reasoning Agent with Scalable Toolsets
by: Li, Xiaoxi, et al.
Published: (2025)
by: Li, Xiaoxi, et al.
Published: (2025)
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
by: Shen, Jingyan, et al.
Published: (2025)
by: Shen, Jingyan, et al.
Published: (2025)
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
by: Pan, Haojie, et al.
Published: (2023)
by: Pan, Haojie, et al.
Published: (2023)
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
by: Chen, Jiaju, et al.
Published: (2025)
by: Chen, Jiaju, et al.
Published: (2025)
Agent-SafetyBench: Evaluating the Safety of LLM Agents
by: Zhang, Zhexin, et al.
Published: (2024)
by: Zhang, Zhexin, et al.
Published: (2024)
Structured Agent Distillation for Large Language Model
by: Liu, Jun, et al.
Published: (2025)
by: Liu, Jun, et al.
Published: (2025)
SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent
by: Ji, Jiarui, et al.
Published: (2024)
by: Ji, Jiarui, et al.
Published: (2024)
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
by: Jiang, Haitao, et al.
Published: (2026)
by: Jiang, Haitao, et al.
Published: (2026)
AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
by: Lu, Yuxuan, et al.
Published: (2025)
by: Lu, Yuxuan, et al.
Published: (2025)
TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
by: Du, Yuwei, et al.
Published: (2024)
by: Du, Yuwei, et al.
Published: (2024)
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)
by: Xu, Huiyu, et al.
Published: (2024)
Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback
by: Hu, Mengkang, et al.
Published: (2025)
by: Hu, Mengkang, et al.
Published: (2025)
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning
by: Hu, Yuyang, et al.
Published: (2026)
by: Hu, Yuyang, et al.
Published: (2026)
DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents
by: Yang, Shiyi, et al.
Published: (2025)
by: Yang, Shiyi, et al.
Published: (2025)
Similar Items
-
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
by: Yao, Jiarui, et al.
Published: (2025) -
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
by: Wang, Ruida, et al.
Published: (2025) -
GUIDE: Towards Scalable Advising for Research Ideas
by: Liu, Yaowenqi, et al.
Published: (2025) -
PhysProver: Advancing Automatic Theorem Proving for Physics
by: Zhang, Hanning, et al.
Published: (2026) -
Active Prompting with Chain-of-Thought for Large Language Models
by: Diao, Shizhe, et al.
Published: (2023)