:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Rui, Lin, Qihan, Liu, Jiayu, Zong, Qing, Zheng, Tianshi, Guo, Dadi, Shi, Haochen, Wang, Weiqi, Song, Yangqiu
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.08992
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty?
by: Liu, Jiayu, et al.
Published: (2025)

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
by: Zong, Qing, et al.
Published: (2025)

NAACL: Noise-AwAre Verbal Confidence Calibration for Robust LLMs in RAG Systems
by: Liu, Jiayu, et al.
Published: (2026)

Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations
by: Li, Chunyang, et al.
Published: (2025)

ComparisonQA: Evaluating Factuality Robustness of LLMs Through Knowledge Frequency Control and Uncertainty
by: Zong, Qing, et al.
Published: (2024)

INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling
by: Shi, Haochen, et al.
Published: (2025)

KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education?
by: Zheng, Tianshi, et al.
Published: (2024)

Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study
by: Xu, Baixuan, et al.
Published: (2025)

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
by: Yim, Yauwai, et al.
Published: (2024)

Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
by: Deng, Zheye, et al.
Published: (2025)

MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
by: Wang, Weiqi, et al.
Published: (2024)

DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
by: Mo, Yunxiang, et al.
Published: (2025)

The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
by: Zheng, Tianshi, et al.
Published: (2025)

KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogical Argument Mining
by: Zheng, Zihao, et al.
Published: (2024)

From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery
by: Zheng, Tianshi, et al.
Published: (2025)

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas
by: Xu, Baixuan, et al.
Published: (2025)

Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
by: Hu, Wenbin, et al.
Published: (2025)

LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
by: Liang, Fangzhou, et al.
Published: (2025)

EcomEdit: An Automated E-commerce Knowledge Editing Framework for Enhanced Product and Purchase Intention Understanding
by: Lau, Ching Ming Samuel, et al.
Published: (2024)

Decomposing Epistemic Uncertainty for Causal Decision Making
by: Rahman, Md Musfiqur, et al.
Published: (2026)

Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction
by: Deng, Zheye, et al.
Published: (2024)

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
by: Wang, Zhaowei, et al.
Published: (2023)

GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory
by: Fan, Wei, et al.
Published: (2024)

LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning
by: Zheng, Tianshi, et al.
Published: (2025)

Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
by: Zong, Chen-Chen, et al.
Published: (2025)

Decision Making under Cumulative Prospect Theory: An Alternating Direction Method of Multipliers
by: Cui, Xiangyu, et al.
Published: (2022)

Rethinking Aleatoric and Epistemic Uncertainty
by: Smith, Freddie Bickford, et al.
Published: (2024)

Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization
by: He, Mutian, et al.
Published: (2022)

Towards Subgraph Isomorphism Counting with Graph Kernels
by: Liu, Xin, et al.
Published: (2024)

ConKE: Conceptualization-Augmented Knowledge Editing in Large Language Models for Commonsense Reasoning
by: Zhang, Liyu, et al.
Published: (2024)

Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation
by: Bai, Jiaxin, et al.
Published: (2023)

Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors
by: Yin, Hang, et al.
Published: (2023)

Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents
by: Fan, Wei, et al.
Published: (2025)

Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation
by: Fan, Wei, et al.
Published: (2023)

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
by: Zheng, Tianshi, et al.
Published: (2024)

Decision Making under Deep Uncertainty
Published: (2020)

Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
by: Wang, Qineng, et al.
Published: (2024)

Enhancing Transformers for Generalizable First-Order Logical Entailment
by: Zheng, Tianshi, et al.
Published: (2025)

arXiv2Table: Toward Realistic Benchmarking and Evaluation for LLM-Based Literature-Review Table Generation
by: Wang, Weiqi, et al.
Published: (2025)

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
by: Jing, Huihao, et al.
Published: (2026)