Saved in:
| Main Authors: | Yin, Zhenyun, Wang, Shujie, Wang, Xuhong, Ma, Xingjun, Wang, Yinchun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.16727 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Constrained Intrinsic Motivation for Reinforcement Learning
by: Zheng, Xiang, et al.
Published: (2024)
by: Zheng, Xiang, et al.
Published: (2024)
DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning
by: Hao, Chuzhan, et al.
Published: (2025)
by: Hao, Chuzhan, et al.
Published: (2025)
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
by: Ma, Shuai, et al.
Published: (2024)
by: Ma, Shuai, et al.
Published: (2024)
Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
by: Wang, Sen, et al.
Published: (2026)
by: Wang, Sen, et al.
Published: (2026)
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
by: Wang, Chaojie, et al.
Published: (2024)
by: Wang, Chaojie, et al.
Published: (2024)
VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning
by: Teng, Fu, et al.
Published: (2025)
by: Teng, Fu, et al.
Published: (2025)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
by: Song, Huatong, et al.
Published: (2025)
by: Song, Huatong, et al.
Published: (2025)
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
by: Song, Huatong, et al.
Published: (2025)
by: Song, Huatong, et al.
Published: (2025)
Deliberative Dynamics and Value Alignment in LLM Debates
by: Sachdeva, Pratik S., et al.
Published: (2025)
by: Sachdeva, Pratik S., et al.
Published: (2025)
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
by: Yuan, Qianhao, et al.
Published: (2025)
by: Yuan, Qianhao, et al.
Published: (2025)
LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness
by: Ye, Tong, et al.
Published: (2025)
by: Ye, Tong, et al.
Published: (2025)
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
by: Li, Yuanhao, et al.
Published: (2026)
by: Li, Yuanhao, et al.
Published: (2026)
Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning
by: Yin, Shouyu, et al.
Published: (2026)
by: Yin, Shouyu, et al.
Published: (2026)
From Order to Distribution: A Spectral Characterization of Forgetting in Continual Learning
by: Xu, Zonghuan, et al.
Published: (2026)
by: Xu, Zonghuan, et al.
Published: (2026)
Aligning Large Language Models with Searcher Preferences
by: Wu, Wei, et al.
Published: (2026)
by: Wu, Wei, et al.
Published: (2026)
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
by: Ye, Tong, et al.
Published: (2024)
by: Ye, Tong, et al.
Published: (2024)
Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning
by: Rui, Shaohao, et al.
Published: (2025)
by: Rui, Shaohao, et al.
Published: (2025)
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
by: Deng, Yong, et al.
Published: (2025)
by: Deng, Yong, et al.
Published: (2025)
Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models
by: Bao, Yicheng, et al.
Published: (2026)
by: Bao, Yicheng, et al.
Published: (2026)
LightSearcher: Efficient DeepSearch via Experiential Memory
by: Lan, Hengzhi, et al.
Published: (2025)
by: Lan, Hengzhi, et al.
Published: (2025)
Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation
by: Xu, Zonghuan, et al.
Published: (2026)
by: Xu, Zonghuan, et al.
Published: (2026)
DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation
by: Gu, Tianjun, et al.
Published: (2025)
by: Gu, Tianjun, et al.
Published: (2025)
Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective
by: Wang, Yubo, et al.
Published: (2025)
by: Wang, Yubo, et al.
Published: (2025)
Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety
by: Jin, Can, et al.
Published: (2026)
by: Jin, Can, et al.
Published: (2026)
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning
by: Li, Yuan, et al.
Published: (2026)
by: Li, Yuan, et al.
Published: (2026)
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
by: Yue, Yu, et al.
Published: (2025)
by: Yue, Yu, et al.
Published: (2025)
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
by: Qin, Tianrui, et al.
Published: (2025)
by: Qin, Tianrui, et al.
Published: (2025)
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
by: Chen, Zehui, et al.
Published: (2024)
by: Chen, Zehui, et al.
Published: (2024)
LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
by: Hu, Yuxuan, et al.
Published: (2025)
by: Hu, Yuxuan, et al.
Published: (2025)
PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
by: Wu, Yutao, et al.
Published: (2025)
by: Wu, Yutao, et al.
Published: (2025)
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
by: Sun, Shuang, et al.
Published: (2025)
by: Sun, Shuang, et al.
Published: (2025)
Efficient Differentiable Causal Discovery via Reliable Super-Structure Learning
by: Ma, Pingchuan, et al.
Published: (2026)
by: Ma, Pingchuan, et al.
Published: (2026)
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
by: Li, Yige, et al.
Published: (2024)
by: Li, Yige, et al.
Published: (2024)
Tree Search for LLM Agent Reinforcement Learning
by: Ji, Yuxiang, et al.
Published: (2025)
by: Ji, Yuxiang, et al.
Published: (2025)
A Problem-Oriented Perspective and Anchor Verification for Code Optimization
by: Ye, Tong, et al.
Published: (2024)
by: Ye, Tong, et al.
Published: (2024)
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
by: Li, Yige, et al.
Published: (2025)
by: Li, Yige, et al.
Published: (2025)
Deliberative Reasoning Network: An Uncertainty-Driven Paradigm for Belief-Tracked Inference with Pretrained Language Models
by: Xu, Anran, et al.
Published: (2025)
by: Xu, Anran, et al.
Published: (2025)
Harnessing LLM for Noise-Robust Cognitive Diagnosis in Web-Based Intelligent Education Systems
by: Zhang, Guixian, et al.
Published: (2025)
by: Zhang, Guixian, et al.
Published: (2025)
AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)
by: Huang, Hanxun, et al.
Published: (2026)
Similar Items
-
Constrained Intrinsic Motivation for Reinforcement Learning
by: Zheng, Xiang, et al.
Published: (2024) -
DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning
by: Hao, Chuzhan, et al.
Published: (2025) -
Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making
by: Ma, Shuai, et al.
Published: (2024) -
Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
by: Wang, Sen, et al.
Published: (2026) -
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
by: Xu, Ran, et al.
Published: (2025)