Saved in:
| Main Authors: | Du, Weihong, Liao, Wenrui, Yan, Binyu, Liang, Hongru, Cohn, Anthony G., Lei, Wenqiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.14079 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PAGED: A Benchmark for Procedural Graphs Extraction from Documents
by: Du, Weihong, et al.
Published: (2024)
by: Du, Weihong, et al.
Published: (2024)
CARE: A Clue-guided Assistant for CSRs to Read User Manuals
by: Du, Weihong, et al.
Published: (2024)
by: Du, Weihong, et al.
Published: (2024)
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task
by: Madge, Chris, et al.
Published: (2024)
by: Madge, Chris, et al.
Published: (2024)
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
by: Xiao, Yongjie, et al.
Published: (2025)
by: Xiao, Yongjie, et al.
Published: (2025)
Can Large Language Models Reason about the Region Connection Calculus?
by: Cohn, Anthony G, et al.
Published: (2024)
by: Cohn, Anthony G, et al.
Published: (2024)
Evaluating the Ability of Large Language Models to Reason about Cardinal Directions
by: Cohn, Anthony G, et al.
Published: (2024)
by: Cohn, Anthony G, et al.
Published: (2024)
Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited
by: Cohn, Anthony G, et al.
Published: (2025)
by: Cohn, Anthony G, et al.
Published: (2025)
Dishonesty in Helpful and Harmless Alignment
by: Huang, Youcheng, et al.
Published: (2024)
by: Huang, Youcheng, et al.
Published: (2024)
Large Language Models as Minecraft Agents
by: Madge, Chris, et al.
Published: (2024)
by: Madge, Chris, et al.
Published: (2024)
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
by: Li, Fangjun, et al.
Published: (2024)
by: Li, Fangjun, et al.
Published: (2024)
GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering
by: Li, Qianlong, et al.
Published: (2024)
by: Li, Qianlong, et al.
Published: (2024)
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
by: Li, Fangjun, et al.
Published: (2024)
by: Li, Fangjun, et al.
Published: (2024)
Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
by: Haris, Erum, et al.
Published: (2024)
by: Haris, Erum, et al.
Published: (2024)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation
by: Liang, Chen, et al.
Published: (2024)
by: Liang, Chen, et al.
Published: (2024)
Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation
by: Zhang, Tong, et al.
Published: (2024)
by: Zhang, Tong, et al.
Published: (2024)
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models
by: Zhang, Tong, et al.
Published: (2024)
by: Zhang, Tong, et al.
Published: (2024)
An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning
by: Kaur, Navdeep, et al.
Published: (2025)
by: Kaur, Navdeep, et al.
Published: (2025)
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues
by: Jayannavar, Prashant, et al.
Published: (2025)
by: Jayannavar, Prashant, et al.
Published: (2025)
Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
by: Blackwell, Robert E., et al.
Published: (2024)
by: Blackwell, Robert E., et al.
Published: (2024)
A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models
by: Wang, Hongru, et al.
Published: (2023)
by: Wang, Hongru, et al.
Published: (2023)
Personalized Decision Modeling: Utility Optimization or Textualized-Symbolic Reasoning
by: Zhao, Yibo, et al.
Published: (2025)
by: Zhao, Yibo, et al.
Published: (2025)
MDC-R: The Minecraft Dialogue Corpus with Reference
by: Madge, Chris, et al.
Published: (2025)
by: Madge, Chris, et al.
Published: (2025)
Nebula: A discourse aware Minecraft Builder
by: Chaturvedi, Akshay, et al.
Published: (2024)
by: Chaturvedi, Akshay, et al.
Published: (2024)
On Fine-Grained I/O Complexity of Attention Backward Passes
by: Li, Xiaoyu, et al.
Published: (2024)
by: Li, Xiaoyu, et al.
Published: (2024)
TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft
by: Long, Qian, et al.
Published: (2024)
by: Long, Qian, et al.
Published: (2024)
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
by: Liao, Huanxuan, et al.
Published: (2024)
by: Liao, Huanxuan, et al.
Published: (2024)
From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation
by: Li, Qingchuan, et al.
Published: (2025)
by: Li, Qingchuan, et al.
Published: (2025)
MCPDial: A Minecraft Persona-driven Dialogue Dataset
by: Alavi, Seyed Hossein, et al.
Published: (2024)
by: Alavi, Seyed Hossein, et al.
Published: (2024)
ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation
by: Huang, Chen, et al.
Published: (2024)
by: Huang, Chen, et al.
Published: (2024)
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
by: Cai, Wenrui, et al.
Published: (2025)
by: Cai, Wenrui, et al.
Published: (2025)
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
by: Cai, Wenrui, et al.
Published: (2025)
by: Cai, Wenrui, et al.
Published: (2025)
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
by: Zhang, Lei, et al.
Published: (2025)
by: Zhang, Lei, et al.
Published: (2025)
A Notion of Complexity for Theory of Mind via Discrete World Models
by: Huang, X. Angelo, et al.
Published: (2024)
by: Huang, X. Angelo, et al.
Published: (2024)
Large Reasoning Models Struggle to Transfer Parametric Knowledge Across Scripts
by: Bandarkar, Lucas, et al.
Published: (2026)
by: Bandarkar, Lucas, et al.
Published: (2026)
Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment
by: Cai, Wenrui, et al.
Published: (2025)
by: Cai, Wenrui, et al.
Published: (2025)
A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning
by: Hu, Lijie, et al.
Published: (2024)
by: Hu, Lijie, et al.
Published: (2024)
BAR-Analytics: A Web-based Platform for Analyzing Information Spreading Barriers in News: Comparative Analysis Across Multiple Barriers and Events
by: Sittar, Abdul, et al.
Published: (2025)
by: Sittar, Abdul, et al.
Published: (2025)
Reasoning-preserved Efficient Distillation of Large Language Models via Activation-aware Initialization
by: He, Junlin, et al.
Published: (2026)
by: He, Junlin, et al.
Published: (2026)
CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation
by: Wang, Danqing, et al.
Published: (2026)
by: Wang, Danqing, et al.
Published: (2026)
Validating Political Position Predictions of Arguments
by: Robinson, Jordan, et al.
Published: (2026)
by: Robinson, Jordan, et al.
Published: (2026)
Similar Items
-
PAGED: A Benchmark for Procedural Graphs Extraction from Documents
by: Du, Weihong, et al.
Published: (2024) -
CARE: A Clue-guided Assistant for CSRs to Read User Manuals
by: Du, Weihong, et al.
Published: (2024) -
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task
by: Madge, Chris, et al.
Published: (2024) -
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
by: Xiao, Yongjie, et al.
Published: (2025) -
Can Large Language Models Reason about the Region Connection Calculus?
by: Cohn, Anthony G, et al.
Published: (2024)