Saved in:
| Main Authors: | Li, Shiyu, Tang, Yang, Wang, Yifan, Li, Peiming, Chen, Xi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.00568 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
by: Li, Jian, et al.
Published: (2026)
by: Li, Jian, et al.
Published: (2026)
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
by: Tang, Yang, et al.
Published: (2025)
by: Tang, Yang, et al.
Published: (2025)
Instructive Dialogue Summarization with Query Aggregations
by: Wang, Bin, et al.
Published: (2023)
by: Wang, Bin, et al.
Published: (2023)
Conan-embedding: General Text Embedding with More and Better Negative Samples
by: Li, Shiyu, et al.
Published: (2024)
by: Li, Shiyu, et al.
Published: (2024)
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
by: Zhang, Dan, et al.
Published: (2024)
by: Zhang, Dan, et al.
Published: (2024)
Multi-Agent Consensus Seeking via Large Language Models
by: Chen, Huaben, et al.
Published: (2023)
by: Chen, Huaben, et al.
Published: (2023)
ExpSeek: Self-Triggered Experience Seeking for Web Agents
by: Zhang, Wenyuan, et al.
Published: (2026)
by: Zhang, Wenyuan, et al.
Published: (2026)
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
by: Du, Yaxin, et al.
Published: (2025)
by: Du, Yaxin, et al.
Published: (2025)
WideSearch: Benchmarking Agentic Broad Info-Seeking
by: Wong, Ryan, et al.
Published: (2025)
by: Wong, Ryan, et al.
Published: (2025)
Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?
by: Zhao, Yibo, et al.
Published: (2026)
by: Zhao, Yibo, et al.
Published: (2026)
Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction
by: Li, Guozheng, et al.
Published: (2024)
by: Li, Guozheng, et al.
Published: (2024)
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
by: Kim, Taehyeon, et al.
Published: (2023)
by: Kim, Taehyeon, et al.
Published: (2023)
Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings
by: Li, Shiyu, et al.
Published: (2025)
by: Li, Shiyu, et al.
Published: (2025)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation
by: Hu, Minda, et al.
Published: (2024)
by: Hu, Minda, et al.
Published: (2024)
InfoAgent: Advancing Autonomous Information-Seeking Agents
by: Zhang, Gongrui, et al.
Published: (2025)
by: Zhang, Gongrui, et al.
Published: (2025)
EvolveSearch: An Iterative Self-Evolving Search Agent
by: Zhang, Dingchu, et al.
Published: (2025)
by: Zhang, Dingchu, et al.
Published: (2025)
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)
by: Wang, Peisong, et al.
Published: (2025)
ReFINE: A Reward-Based Framework for Interpretable and Nuanced Evaluation of Radiology Report Generation
by: Liu, Yunyi, et al.
Published: (2024)
by: Liu, Yunyi, et al.
Published: (2024)
Sparse Rewards Can Self-Train Dialogue Agents
by: Lattimer, Barrett Martin, et al.
Published: (2024)
by: Lattimer, Barrett Martin, et al.
Published: (2024)
Triviality Corrected Endogenous Reward
by: Wang, Xinda, et al.
Published: (2026)
by: Wang, Xinda, et al.
Published: (2026)
Re-ReST: Reflection-Reinforced Self-Training for Language Agents
by: Dou, Zi-Yi, et al.
Published: (2024)
by: Dou, Zi-Yi, et al.
Published: (2024)
AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents
by: Gao, Wenbo, et al.
Published: (2026)
by: Gao, Wenbo, et al.
Published: (2026)
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning
by: Wu, Juncheng, et al.
Published: (2026)
by: Wu, Juncheng, et al.
Published: (2026)
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
by: Cai, Shaofei, et al.
Published: (2025)
by: Cai, Shaofei, et al.
Published: (2025)
Beyond Retrieval-Ranking: A Multi-Agent Cognitive Decision Framework for E-Commerce Search
by: Zhai, Zhouwei, et al.
Published: (2025)
by: Zhai, Zhouwei, et al.
Published: (2025)
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding
by: Sun, Zhongxiang, et al.
Published: (2025)
by: Sun, Zhongxiang, et al.
Published: (2025)
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
by: Li, Junyi, et al.
Published: (2024)
by: Li, Junyi, et al.
Published: (2024)
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
by: Zhang, Jiajie, et al.
Published: (2026)
by: Zhang, Jiajie, et al.
Published: (2026)
A Comprehensive Graph Framework for Question Answering with Mode-Seeking Preference Alignment
by: Tang, Quanwei, et al.
Published: (2025)
by: Tang, Quanwei, et al.
Published: (2025)
Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios
by: Zang, Jianxiang, et al.
Published: (2025)
by: Zang, Jianxiang, et al.
Published: (2025)
The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement
by: Wang, Xiaobo, et al.
Published: (2026)
by: Wang, Xiaobo, et al.
Published: (2026)
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
by: Chen, Mingyang, et al.
Published: (2025)
by: Chen, Mingyang, et al.
Published: (2025)
Aligning Deep Implicit Preferences by Learning to Reason Defensively
by: Li, Peiming, et al.
Published: (2025)
by: Li, Peiming, et al.
Published: (2025)
Self-Correction Makes LLMs Better Parsers
by: Zhang, Ziyan, et al.
Published: (2025)
by: Zhang, Ziyan, et al.
Published: (2025)
Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing
by: Guo, Peiming, et al.
Published: (2025)
by: Guo, Peiming, et al.
Published: (2025)
DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking
by: Lan, Tian, et al.
Published: (2025)
by: Lan, Tian, et al.
Published: (2025)
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
by: Lu, Songshuo, et al.
Published: (2025)
by: Lu, Songshuo, et al.
Published: (2025)
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
ARIA: Training Language Agents with Intention-Driven Reward Aggregation
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
Similar Items
-
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026) -
SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
by: Li, Jian, et al.
Published: (2026) -
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
by: Tang, Yang, et al.
Published: (2025) -
Instructive Dialogue Summarization with Query Aggregations
by: Wang, Bin, et al.
Published: (2023) -
Conan-embedding: General Text Embedding with More and Better Negative Samples
by: Li, Shiyu, et al.
Published: (2024)