Enregistré dans:
| Auteurs principaux: | Su, Ying, Ling, Zhan, Shi, Haochen, Cheng, Jiayang, Yim, Yauwai, Song, Yangqiu |
|---|---|
| Format: | Preprint |
| Publié: |
2024
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2410.03907 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
par: Liang, Fangzhou, et autres
Publié: (2025)
par: Liang, Fangzhou, et autres
Publié: (2025)
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
par: Chan, Chunkit, et autres
Publié: (2024)
par: Chan, Chunkit, et autres
Publié: (2024)
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
par: Zheng, Tianshi, et autres
Publié: (2024)
par: Zheng, Tianshi, et autres
Publié: (2024)
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
par: Yim, Yauwai, et autres
Publié: (2024)
par: Yim, Yauwai, et autres
Publié: (2024)
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
par: Chan, Chunkit, et autres
Publié: (2024)
par: Chan, Chunkit, et autres
Publié: (2024)
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
par: Wang, Zhaowei, et autres
Publié: (2023)
par: Wang, Zhaowei, et autres
Publié: (2023)
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
par: Wang, Weiqi, et autres
Publié: (2024)
par: Wang, Weiqi, et autres
Publié: (2024)
Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction
par: Deng, Zheye, et autres
Publié: (2024)
par: Deng, Zheye, et autres
Publié: (2024)
InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding
par: Jiayang, Cheng, et autres
Publié: (2025)
par: Jiayang, Cheng, et autres
Publié: (2025)
XToM: Exploring the Multilingual Theory of Mind for Large Language Models
par: Chan, Chunkit, et autres
Publié: (2025)
par: Chan, Chunkit, et autres
Publié: (2025)
PreAct: Prediction Enhances Agent's Planning Ability
par: Fu, Dayuan, et autres
Publié: (2024)
par: Fu, Dayuan, et autres
Publié: (2024)
ISO-Bench: Benchmarking Multimodal Causal Reasoning in Visual-Language Models through Procedural Plans
par: Sadana, Ananya, et autres
Publié: (2025)
par: Sadana, Ananya, et autres
Publié: (2025)
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning
par: Wang, Weiqi, et autres
Publié: (2024)
par: Wang, Weiqi, et autres
Publié: (2024)
OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset
par: Hu, Wenbin, et autres
Publié: (2026)
par: Hu, Wenbin, et autres
Publié: (2026)
LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning
par: Zheng, Tianshi, et autres
Publié: (2025)
par: Zheng, Tianshi, et autres
Publié: (2025)
LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning
par: Sun, Shibo, et autres
Publié: (2025)
par: Sun, Shibo, et autres
Publié: (2025)
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
par: Jiayang, Cheng, et autres
Publié: (2026)
par: Jiayang, Cheng, et autres
Publié: (2026)
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
par: Jing, Huihao, et autres
Publié: (2026)
par: Jing, Huihao, et autres
Publié: (2026)
Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments
par: Arora, Raghav, et autres
Publié: (2025)
par: Arora, Raghav, et autres
Publié: (2025)
Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
par: Hu, Wenbin, et autres
Publié: (2025)
par: Hu, Wenbin, et autres
Publié: (2025)
EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs
par: Jiayang, Cheng, et autres
Publié: (2024)
par: Jiayang, Cheng, et autres
Publié: (2024)
PipeNet: Question Answering with Semantic Pruning over Knowledge Graphs
par: Su, Ying, et autres
Publié: (2024)
par: Su, Ying, et autres
Publié: (2024)
The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas
par: Xu, Baixuan, et autres
Publié: (2025)
par: Xu, Baixuan, et autres
Publié: (2025)
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce
par: Ding, Wenxuan, et autres
Publié: (2024)
par: Ding, Wenxuan, et autres
Publié: (2024)
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
par: Shi, Zijing, et autres
Publié: (2025)
par: Shi, Zijing, et autres
Publié: (2025)
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
par: Erdogan, Lutfi Eren, et autres
Publié: (2025)
par: Erdogan, Lutfi Eren, et autres
Publié: (2025)
ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations
par: Chan, Chunkit, et autres
Publié: (2023)
par: Chan, Chunkit, et autres
Publié: (2023)
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
par: Chen, Yi, et autres
Publié: (2023)
par: Chen, Yi, et autres
Publié: (2023)
DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
par: Mo, Yunxiang, et autres
Publié: (2025)
par: Mo, Yunxiang, et autres
Publié: (2025)
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps
par: Nasir, Muhammad Umair, et autres
Publié: (2024)
par: Nasir, Muhammad Umair, et autres
Publié: (2024)
On the Ability of Transformers to Verify Plans
par: Sarrof, Yash, et autres
Publié: (2026)
par: Sarrof, Yash, et autres
Publié: (2026)
EcomEdit: An Automated E-commerce Knowledge Editing Framework for Enhanced Product and Purchase Intention Understanding
par: Lau, Ching Ming Samuel, et autres
Publié: (2024)
par: Lau, Ching Ming Samuel, et autres
Publié: (2024)
Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning
par: Yue, Yuanhao, et autres
Publié: (2024)
par: Yue, Yuanhao, et autres
Publié: (2024)
PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset
par: Uzunoglu, Arda, et autres
Publié: (2024)
par: Uzunoglu, Arda, et autres
Publié: (2024)
Deliberate Planning in Language Models with Symbolic Representation
par: Xiong, Siheng, et autres
Publié: (2025)
par: Xiong, Siheng, et autres
Publié: (2025)
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery
par: Lu, Feihong, et autres
Publié: (2024)
par: Lu, Feihong, et autres
Publié: (2024)
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
par: Wang, Qineng, et autres
Publié: (2024)
par: Wang, Qineng, et autres
Publié: (2024)
Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
par: Fei, Zhaoye, et autres
Publié: (2025)
par: Fei, Zhaoye, et autres
Publié: (2025)
Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models
par: Lin, Zizheng, et autres
Publié: (2024)
par: Lin, Zizheng, et autres
Publié: (2024)
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
par: Xie, Jian, et autres
Publié: (2024)
par: Xie, Jian, et autres
Publié: (2024)
Documents similaires
-
LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
par: Liang, Fangzhou, et autres
Publié: (2025) -
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
par: Chan, Chunkit, et autres
Publié: (2024) -
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
par: Zheng, Tianshi, et autres
Publié: (2024) -
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
par: Yim, Yauwai, et autres
Publié: (2024) -
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
par: Chan, Chunkit, et autres
Publié: (2024)