Saved in:
| Main Authors: | Wang, Rui, Lin, Qihan, Liu, Jiayu, Zong, Qing, Zheng, Tianshi, Guo, Dadi, Shi, Haochen, Wang, Weiqi, Song, Yangqiu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.08992 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
by: Zong, Qing, et al.
Published: (2025)
by: Zong, Qing, et al.
Published: (2025)
NAACL: Noise-AwAre Verbal Confidence Calibration for Robust LLMs in RAG Systems
by: Liu, Jiayu, et al.
Published: (2026)
by: Liu, Jiayu, et al.
Published: (2026)
Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
by: Li, Chunyang, et al.
Published: (2025)
by: Li, Chunyang, et al.
Published: (2025)
ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
by: Zong, Qing, et al.
Published: (2024)
by: Zong, Qing, et al.
Published: (2024)
INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
by: Shi, Haochen, et al.
Published: (2025)
by: Shi, Haochen, et al.
Published: (2025)
KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education?
by: Zheng, Tianshi, et al.
Published: (2024)
by: Zheng, Tianshi, et al.
Published: (2024)
Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study
by: Xu, Baixuan, et al.
Published: (2025)
by: Xu, Baixuan, et al.
Published: (2025)
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
by: Yim, Yauwai, et al.
Published: (2024)
by: Yim, Yauwai, et al.
Published: (2024)
Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
by: Deng, Zheye, et al.
Published: (2025)
by: Deng, Zheye, et al.
Published: (2025)
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
by: Wang, Weiqi, et al.
Published: (2024)
by: Wang, Weiqi, et al.
Published: (2024)
DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
by: Mo, Yunxiang, et al.
Published: (2025)
by: Mo, Yunxiang, et al.
Published: (2025)
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
by: Zheng, Tianshi, et al.
Published: (2025)
by: Zheng, Tianshi, et al.
Published: (2025)
KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogical Argument Mining
by: Zheng, Zihao, et al.
Published: (2024)
by: Zheng, Zihao, et al.
Published: (2024)
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery
by: Zheng, Tianshi, et al.
Published: (2025)
by: Zheng, Tianshi, et al.
Published: (2025)
The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas
by: Xu, Baixuan, et al.
Published: (2025)
by: Xu, Baixuan, et al.
Published: (2025)
Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
by: Hu, Wenbin, et al.
Published: (2025)
by: Hu, Wenbin, et al.
Published: (2025)
LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
by: Liang, Fangzhou, et al.
Published: (2025)
by: Liang, Fangzhou, et al.
Published: (2025)
EcomEdit: An Automated E-commerce Knowledge Editing Framework for Enhanced Product and Purchase Intention Understanding
by: Lau, Ching Ming Samuel, et al.
Published: (2024)
by: Lau, Ching Ming Samuel, et al.
Published: (2024)
Decomposing Epistemic Uncertainty for Causal Decision Making
by: Rahman, Md Musfiqur, et al.
Published: (2026)
by: Rahman, Md Musfiqur, et al.
Published: (2026)
Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction
by: Deng, Zheye, et al.
Published: (2024)
by: Deng, Zheye, et al.
Published: (2024)
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
by: Wang, Zhaowei, et al.
Published: (2023)
by: Wang, Zhaowei, et al.
Published: (2023)
GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory
by: Fan, Wei, et al.
Published: (2024)
by: Fan, Wei, et al.
Published: (2024)
LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning
by: Zheng, Tianshi, et al.
Published: (2025)
by: Zheng, Tianshi, et al.
Published: (2025)
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
by: Zong, Chen-Chen, et al.
Published: (2025)
by: Zong, Chen-Chen, et al.
Published: (2025)
Decision Making under Cumulative Prospect Theory: An Alternating Direction Method of Multipliers
by: Cui, Xiangyu, et al.
Published: (2022)
by: Cui, Xiangyu, et al.
Published: (2022)
Rethinking Aleatoric and Epistemic Uncertainty
by: Smith, Freddie Bickford, et al.
Published: (2024)
by: Smith, Freddie Bickford, et al.
Published: (2024)
Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization
by: He, Mutian, et al.
Published: (2022)
by: He, Mutian, et al.
Published: (2022)
Towards Subgraph Isomorphism Counting with Graph Kernels
by: Liu, Xin, et al.
Published: (2024)
by: Liu, Xin, et al.
Published: (2024)
ConKE: Conceptualization-Augmented Knowledge Editing in Large Language Models for Commonsense Reasoning
by: Zhang, Liyu, et al.
Published: (2024)
by: Zhang, Liyu, et al.
Published: (2024)
Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation
by: Bai, Jiaxin, et al.
Published: (2023)
by: Bai, Jiaxin, et al.
Published: (2023)
Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors
by: Yin, Hang, et al.
Published: (2023)
by: Yin, Hang, et al.
Published: (2023)
Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents
by: Fan, Wei, et al.
Published: (2025)
by: Fan, Wei, et al.
Published: (2025)
Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation
by: Fan, Wei, et al.
Published: (2023)
by: Fan, Wei, et al.
Published: (2023)
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
by: Zheng, Tianshi, et al.
Published: (2024)
by: Zheng, Tianshi, et al.
Published: (2024)
Decision Making under Deep Uncertainty
Published: (2020)
Published: (2020)
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
by: Wang, Qineng, et al.
Published: (2024)
by: Wang, Qineng, et al.
Published: (2024)
Enhancing Transformers for Generalizable First-Order Logical Entailment
by: Zheng, Tianshi, et al.
Published: (2025)
by: Zheng, Tianshi, et al.
Published: (2025)
arXiv2Table: Toward Realistic Benchmarking and Evaluation for LLM-Based Literature-Review Table Generation
by: Wang, Weiqi, et al.
Published: (2025)
by: Wang, Weiqi, et al.
Published: (2025)
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
by: Jing, Huihao, et al.
Published: (2026)
by: Jing, Huihao, et al.
Published: (2026)
Similar Items
-
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
by: Liu, Jiayu, et al.
Published: (2025) -
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
by: Zong, Qing, et al.
Published: (2025) -
NAACL: Noise-AwAre Verbal Confidence Calibration for Robust LLMs in RAG Systems
by: Liu, Jiayu, et al.
Published: (2026) -
Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
by: Li, Chunyang, et al.
Published: (2025) -
ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
by: Zong, Qing, et al.
Published: (2024)