:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Du, Weihong, Liao, Wenrui, Yan, Binyu, Liang, Hongru, Cohn, Anthony G., Lei, Wenqiang
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.14079
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PAGED: A Benchmark for Procedural Graphs Extraction from Documents
by: Du, Weihong, et al.
Published: (2024)

CARE: A Clue-guided Assistant for CSRs to Read User Manuals
by: Du, Weihong, et al.
Published: (2024)

A LLM Benchmark based on the Minecraft Builder Dialog Agent Task
by: Madge, Chris, et al.
Published: (2024)

SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
by: Xiao, Yongjie, et al.
Published: (2025)

Can Large Language Models Reason about the Region Connection Calculus?
by: Cohn, Anthony G, et al.
Published: (2024)

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions
by: Cohn, Anthony G, et al.
Published: (2024)

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited
by: Cohn, Anthony G, et al.
Published: (2025)

Dishonesty in Helpful and Harmless Alignment
by: Huang, Youcheng, et al.
Published: (2024)

Large Language Models as Minecraft Agents
by: Madge, Chris, et al.
Published: (2024)

Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
by: Li, Fangjun, et al.
Published: (2024)

GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering
by: Li, Qianlong, et al.
Published: (2024)

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
by: Li, Fangjun, et al.
Published: (2024)

Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
by: Haris, Erum, et al.
Published: (2024)

Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation
by: Liang, Chen, et al.
Published: (2024)

Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation
by: Zhang, Tong, et al.
Published: (2024)

CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models
by: Zhang, Tong, et al.
Published: (2024)

An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning
by: Kaur, Navdeep, et al.
Published: (2025)

BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues
by: Jayannavar, Prashant, et al.
Published: (2025)

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
by: Blackwell, Robert E., et al.
Published: (2024)

A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models
by: Wang, Hongru, et al.
Published: (2023)

Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning
by: Zhao, Yibo, et al.
Published: (2025)

MDC-R: The Minecraft Dialogue Corpus with Reference
by: Madge, Chris, et al.
Published: (2025)

Nebula: A discourse aware Minecraft Builder
by: Chaturvedi, Akshay, et al.
Published: (2024)

On Fine-Grained I/O Complexity of Attention Backward Passes
by: Li, Xiaoyu, et al.
Published: (2024)

TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft
by: Long, Qian, et al.
Published: (2024)

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
by: Liao, Huanxuan, et al.
Published: (2024)

From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation
by: Li, Qingchuan, et al.
Published: (2025)

MCPDial: A Minecraft Persona-driven Dialogue Dataset
by: Alavi, Seyed Hossein, et al.
Published: (2024)

ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation
by: Huang, Chen, et al.
Published: (2024)

Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
by: Cai, Wenrui, et al.
Published: (2025)

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
by: Cai, Wenrui, et al.
Published: (2025)

MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
by: Zhang, Lei, et al.
Published: (2025)

A Notion of Complexity for Theory of Mind via Discrete World Models
by: Huang, X. Angelo, et al.
Published: (2024)

Large Reasoning Models Struggle to Transfer Parametric Knowledge Across Scripts
by: Bandarkar, Lucas, et al.
Published: (2026)

Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment
by: Cai, Wenrui, et al.
Published: (2025)

A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning
by: Hu, Lijie, et al.
Published: (2024)

BAR-Analytics: A Web-based Platform for Analyzing Information Spreading Barriers in News: Comparative Analysis Across Multiple Barriers and Events
by: Sittar, Abdul, et al.
Published: (2025)

Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization
by: He, Junlin, et al.
Published: (2026)

CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation
by: Wang, Danqing, et al.
Published: (2026)

Validating Political Position Predictions of Arguments
by: Robinson, Jordan, et al.
Published: (2026)