Saved in:
| Main Authors: | Zhang, Hongbo, Cui, Han, Wang, Yidong, Tian, Yijian, Guo, Qi, Wang, Cunxiang, Wu, Jian, Song, Chiyu, Zhang, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.21900 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025)
by: Wang, Yidong, et al.
Published: (2025)
How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024)
by: Bao, Guangsheng, et al.
Published: (2024)
Knowledge Conflicts for LLMs: A Survey
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
AutoSurvey: Large Language Models Can Automatically Write Surveys
by: Wang, Yidong, et al.
Published: (2024)
by: Wang, Yidong, et al.
Published: (2024)
A Survey on Evaluation of Large Language Models
by: Chang, Yupeng, et al.
Published: (2023)
by: Chang, Yupeng, et al.
Published: (2023)
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future
by: Wang, Yidong, et al.
Published: (2025)
by: Wang, Yidong, et al.
Published: (2025)
Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values
by: Zhang, Hongbo, et al.
Published: (2025)
by: Zhang, Hongbo, et al.
Published: (2025)
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
by: Wang, Yidong, et al.
Published: (2023)
by: Wang, Yidong, et al.
Published: (2023)
Nash CoT: Multi-Path Inference with Preference Equilibrium
by: Zhang, Ziqi, et al.
Published: (2024)
by: Zhang, Ziqi, et al.
Published: (2024)
Unlocking Recursive Thinking of LLMs: Alignment via Refinement
by: Zhang, Haoke, et al.
Published: (2025)
by: Zhang, Haoke, et al.
Published: (2025)
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
by: Wu, Yiran, et al.
Published: (2024)
by: Wu, Yiran, et al.
Published: (2024)
CycleResearcher: Improving Automated Research via Automated Review
by: Weng, Yixuan, et al.
Published: (2024)
by: Weng, Yixuan, et al.
Published: (2024)
DeepSurvey-Bench: Evaluating Academic Value of Automatically Generated Scientific Survey
by: Zhang, Guo-Biao, et al.
Published: (2026)
by: Zhang, Guo-Biao, et al.
Published: (2026)
Deep Research: A Systematic Survey
by: Shi, Zhengliang, et al.
Published: (2025)
by: Shi, Zhengliang, et al.
Published: (2025)
TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information
by: Zhang, Jinghong, et al.
Published: (2025)
by: Zhang, Jinghong, et al.
Published: (2025)
$R^3$: "This is My SQL, Are You With Me?" A Consensus-Based Multi-Agent System for Text-to-SQL Tasks
by: Xia, Hanchen, et al.
Published: (2024)
by: Xia, Hanchen, et al.
Published: (2024)
Reasoning on Multiple Needles In A Haystack
by: Wang, Yidong
Published: (2025)
by: Wang, Yidong
Published: (2025)
SEW: Self-Evolving Agentic Workflows for Automated Code Generation
by: Liu, Siwei, et al.
Published: (2025)
by: Liu, Siwei, et al.
Published: (2025)
Demystifying Instruction Mixing for Fine-tuning Large Language Models
by: Wang, Renxi, et al.
Published: (2023)
by: Wang, Renxi, et al.
Published: (2023)
TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces
by: Yang, Shu-Xun, et al.
Published: (2026)
by: Yang, Shu-Xun, et al.
Published: (2026)
AFlow: Automating Agentic Workflow Generation
by: Zhang, Jiayi, et al.
Published: (2024)
by: Zhang, Jiayi, et al.
Published: (2024)
LongSafety: Evaluating Long-Context Safety of Large Language Models
by: Lu, Yida, et al.
Published: (2025)
by: Lu, Yida, et al.
Published: (2025)
Detecting RLVR Training Data via Structural Convergence of Reasoning
by: Zhang, Hongbo, et al.
Published: (2026)
by: Zhang, Hongbo, et al.
Published: (2026)
Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review
by: Gan, Yidong, et al.
Published: (2024)
by: Gan, Yidong, et al.
Published: (2024)
UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions
by: Wang, Hongru, et al.
Published: (2024)
by: Wang, Hongru, et al.
Published: (2024)
AutoFlow: Automated Workflow Generation for Large Language Model Agents
by: Li, Zelong, et al.
Published: (2024)
by: Li, Zelong, et al.
Published: (2024)
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
by: Chen, Haolin, et al.
Published: (2026)
by: Chen, Haolin, et al.
Published: (2026)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
by: Cao, Ruisheng, et al.
Published: (2024)
by: Cao, Ruisheng, et al.
Published: (2024)
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation
by: Liu, Xiaoze, et al.
Published: (2024)
by: Liu, Xiaoze, et al.
Published: (2024)
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
by: Yang, Chuanpeng, et al.
Published: (2024)
by: Yang, Chuanpeng, et al.
Published: (2024)
Distilling Text Style Transfer With Self-Explanation From LLMs
by: Zhang, Chiyu, et al.
Published: (2024)
by: Zhang, Chiyu, et al.
Published: (2024)
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
by: Chen, Luoxin, et al.
Published: (2025)
by: Chen, Luoxin, et al.
Published: (2025)
AutoSurvey2: Empowering Researchers with Next Level Automated Literature Surveys
by: Wu, Siyi, et al.
Published: (2025)
by: Wu, Siyi, et al.
Published: (2025)
SAC-Opt: Semantic Anchors for Iterative Correction in Optimization Modeling
by: Zhang, Yansen, et al.
Published: (2025)
by: Zhang, Yansen, et al.
Published: (2025)
Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey
by: Zhang, Weixu, et al.
Published: (2023)
by: Zhang, Weixu, et al.
Published: (2023)
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
by: Yue, Ling, et al.
Published: (2026)
by: Yue, Ling, et al.
Published: (2026)
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
by: Zhou, Changzhi, et al.
Published: (2025)
by: Zhou, Changzhi, et al.
Published: (2025)
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
by: Fan, Shengda, et al.
Published: (2024)
by: Fan, Shengda, et al.
Published: (2024)
Similar Items
-
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025) -
How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024) -
Knowledge Conflicts for LLMs: A Survey
by: Xu, Rongwu, et al.
Published: (2024) -
AutoSurvey: Large Language Models Can Automatically Write Surveys
by: Wang, Yidong, et al.
Published: (2024) -
A Survey on Evaluation of Large Language Models
by: Chang, Yupeng, et al.
Published: (2023)