Saved in:
| Main Authors: | Xu, Jiaqi, Huang, Tao, Zhang, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00611 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Estimating the Self-Consistency of LLMs
by: Nowak, Robert
Published: (2025)
by: Nowak, Robert
Published: (2025)
SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition
by: Wu, Mengsong, et al.
Published: (2025)
by: Wu, Mengsong, et al.
Published: (2025)
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction
by: Li, Pengze, et al.
Published: (2025)
by: Li, Pengze, et al.
Published: (2025)
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
by: Zhou, Xin, et al.
Published: (2025)
by: Zhou, Xin, et al.
Published: (2025)
PosterGen: Aesthetic-Aware Multi-Modal Paper-to-Poster Generation via Multi-Agent LLMs
by: Zhang, Zhilin, et al.
Published: (2025)
by: Zhang, Zhilin, et al.
Published: (2025)
The Consistency-Acceptability Divergence of LLMs in Judicial Decision-Making: Task and Stakeholder Dimensions
by: MingDa, Zhang, et al.
Published: (2025)
by: MingDa, Zhang, et al.
Published: (2025)
Self-Training Meets Consistency: Improving LLMs' Reasoning with Consistency-Driven Rationale Evaluation
by: Lee, Jaehyeok, et al.
Published: (2024)
by: Lee, Jaehyeok, et al.
Published: (2024)
Can LLMs Effectively Leverage Graph Structural Information through Prompts, and Why?
by: Huang, Jin, et al.
Published: (2023)
by: Huang, Jin, et al.
Published: (2023)
Confidence Improves Self-Consistency in LLMs
by: Taubenfeld, Amir, et al.
Published: (2025)
by: Taubenfeld, Amir, et al.
Published: (2025)
Evaluating Multimodal Large Language Models with Daily Composite Tasks in Home Environments
by: Zhang, Zhenliang, et al.
Published: (2025)
by: Zhang, Zhenliang, et al.
Published: (2025)
Self-Evaluating LLMs for Multi-Step Tasks: Stepwise Confidence Estimation for Failure Detection
by: Mavi, Vaibhav, et al.
Published: (2025)
by: Mavi, Vaibhav, et al.
Published: (2025)
Improving Multi-turn Dialogue Consistency with Self-Recall Thinking
by: Pang, Renning, et al.
Published: (2026)
by: Pang, Renning, et al.
Published: (2026)
TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks
by: Xu, Hanwen, et al.
Published: (2025)
by: Xu, Hanwen, et al.
Published: (2025)
SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts
by: Zou, Qingsong, et al.
Published: (2026)
by: Zou, Qingsong, et al.
Published: (2026)
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
by: Wang, Lei, et al.
Published: (2024)
by: Wang, Lei, et al.
Published: (2024)
Evaluating LLMs for Visualization Tasks
by: Khan, Saadiq Rauf, et al.
Published: (2025)
by: Khan, Saadiq Rauf, et al.
Published: (2025)
Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning
by: Zhang, Yuelin, et al.
Published: (2025)
by: Zhang, Yuelin, et al.
Published: (2025)
Evaluating Role-Consistency in LLMs for Counselor Training
by: Rudolph, Eric, et al.
Published: (2026)
by: Rudolph, Eric, et al.
Published: (2026)
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
by: Hou, Yujie, et al.
Published: (2025)
by: Hou, Yujie, et al.
Published: (2025)
Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging
by: Chang, Chia-Hsuan, et al.
Published: (2024)
by: Chang, Chia-Hsuan, et al.
Published: (2024)
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
by: Li, Youquan, et al.
Published: (2024)
by: Li, Youquan, et al.
Published: (2024)
Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification
by: Kumar, Adarsh, et al.
Published: (2025)
by: Kumar, Adarsh, et al.
Published: (2025)
Self-Supervised Multi-Object Tracking with Path Consistency
by: Lu, Zijia, et al.
Published: (2024)
by: Lu, Zijia, et al.
Published: (2024)
XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
by: Kabir, Mohsinul, et al.
Published: (2026)
by: Kabir, Mohsinul, et al.
Published: (2026)
Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)
by: Murugadoss, Bhuvanashree, et al.
Published: (2024)
Learning to Compress Graphs via Dual Agents for Consistent Topological Robustness Evaluation
by: Chai, Qisen, et al.
Published: (2025)
by: Chai, Qisen, et al.
Published: (2025)
PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
by: Liu, Yunuo, et al.
Published: (2025)
by: Liu, Yunuo, et al.
Published: (2025)
Efficient Multi-Task Learning via Generalist Recommender
by: Wang, Luyang, et al.
Published: (2025)
by: Wang, Luyang, et al.
Published: (2025)
Training-free Composite Scene Generation for Layout-to-Image Synthesis
by: Liu, Jiaqi, et al.
Published: (2024)
by: Liu, Jiaqi, et al.
Published: (2024)
Spatial Computing Communications for Multi-User Virtual Reality in Distributed Mobile Edge Computing Network
by: Xu, Caolu, et al.
Published: (2025)
by: Xu, Caolu, et al.
Published: (2025)
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
by: Sreekar, P Aditya, et al.
Published: (2024)
by: Sreekar, P Aditya, et al.
Published: (2024)
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
by: Cheng, Ruoxi, et al.
Published: (2024)
by: Cheng, Ruoxi, et al.
Published: (2024)
MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
by: Lu, Siyuan, et al.
Published: (2024)
by: Lu, Siyuan, et al.
Published: (2024)
Assessing Image Inpainting via Re-Inpainting Self-Consistency Evaluation
by: Chen, Tianyi, et al.
Published: (2024)
by: Chen, Tianyi, et al.
Published: (2024)
Multi-Task Learning with LLMs for Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning
by: Lai, Wenna, et al.
Published: (2024)
by: Lai, Wenna, et al.
Published: (2024)
Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards
by: Guo, Kai-Yuan, et al.
Published: (2026)
by: Guo, Kai-Yuan, et al.
Published: (2026)
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits
by: Zhang, Xiang, et al.
Published: (2025)
by: Zhang, Xiang, et al.
Published: (2025)
STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability
by: Wang, Guanghui, et al.
Published: (2025)
by: Wang, Guanghui, et al.
Published: (2025)
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
by: Shukla, Arth, et al.
Published: (2024)
by: Shukla, Arth, et al.
Published: (2024)
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
by: Zhang, Xiaoying, et al.
Published: (2024)
by: Zhang, Xiaoying, et al.
Published: (2024)
Similar Items
-
Estimating the Self-Consistency of LLMs
by: Nowak, Robert
Published: (2025) -
SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition
by: Wu, Mengsong, et al.
Published: (2025) -
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction
by: Li, Pengze, et al.
Published: (2025) -
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
by: Zhou, Xin, et al.
Published: (2025) -
PosterGen: Aesthetic-Aware Multi-Modal Paper-to-Poster Generation via Multi-Agent LLMs
by: Zhang, Zhilin, et al.
Published: (2025)