Saved in:
| Main Authors: | Xie, Jian, Zhang, Kai, Chen, Jiangjie, Zhu, Tinghui, Lou, Renze, Tian, Yuandong, Xiao, Yanghua, Su, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01622 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents
by: Oh, Juhyun, et al.
Published: (2025)
by: Oh, Juhyun, et al.
Published: (2025)
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
by: Xie, Jian, et al.
Published: (2023)
by: Xie, Jian, et al.
Published: (2023)
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
by: Wu, Siye, et al.
Published: (2024)
by: Wu, Siye, et al.
Published: (2024)
TravelAgent: An AI Assistant for Personalized Travel Planning
by: Chen, Aili, et al.
Published: (2024)
by: Chen, Aili, et al.
Published: (2024)
Revealing the Barriers of Language Agents in Planning
by: Xie, Jian, et al.
Published: (2024)
by: Xie, Jian, et al.
Published: (2024)
Can LLMs Learn to Map the World from Local Descriptions?
by: Xia, Sirui, et al.
Published: (2025)
by: Xia, Sirui, et al.
Published: (2025)
Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games
by: Zhang, Yikai, et al.
Published: (2025)
by: Zhang, Yikai, et al.
Published: (2025)
Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning
by: Zhu, Tinghui, et al.
Published: (2024)
by: Zhu, Tinghui, et al.
Published: (2024)
MCiteBench: A Multimodal Benchmark for Generating Text with Citations
by: Hu, Caiyu, et al.
Published: (2025)
by: Hu, Caiyu, et al.
Published: (2025)
Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example
by: Chen, Yanan, et al.
Published: (2024)
by: Chen, Yanan, et al.
Published: (2024)
Large Language Model Instruction Following: A Survey of Progresses and Challenges
by: Lou, Renze, et al.
Published: (2023)
by: Lou, Renze, et al.
Published: (2023)
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
by: Zhang, Yikai, et al.
Published: (2024)
by: Zhang, Yikai, et al.
Published: (2024)
From Persona to Personalization: A Survey on Role-Playing Language Agents
by: Chen, Jiangjie, et al.
Published: (2024)
by: Chen, Jiangjie, et al.
Published: (2024)
ARIA: Training Language Agents with Intention-Driven Reward Aggregation
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models
by: Gu, Xiaojie, et al.
Published: (2026)
by: Gu, Xiaojie, et al.
Published: (2026)
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
by: Shao, Jie-Jing, et al.
Published: (2024)
by: Shao, Jie-Jing, et al.
Published: (2024)
GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick
by: Fu, Jiayi, et al.
Published: (2024)
by: Fu, Jiayi, et al.
Published: (2024)
Toward Zero-Shot Instruction Following
by: Lou, Renze, et al.
Published: (2023)
by: Lou, Renze, et al.
Published: (2023)
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
by: Yang, Ruihan, et al.
Published: (2024)
by: Yang, Ruihan, et al.
Published: (2024)
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
by: Lou, Renze, et al.
Published: (2023)
by: Lou, Renze, et al.
Published: (2023)
Is Extending Modality The Right Path Towards Omni-Modality?
by: Zhu, Tinghui, et al.
Published: (2025)
by: Zhu, Tinghui, et al.
Published: (2025)
ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base
by: Yuan, Siyu, et al.
Published: (2023)
by: Yuan, Siyu, et al.
Published: (2023)
Piecing Together Clues: A Benchmark for Evaluating the Detective Skills of Large Language Models
by: Gu, Zhouhong, et al.
Published: (2023)
by: Gu, Zhouhong, et al.
Published: (2023)
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
by: Choi, Jihye, et al.
Published: (2025)
by: Choi, Jihye, et al.
Published: (2025)
To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning
by: JU, Da, et al.
Published: (2024)
by: JU, Da, et al.
Published: (2024)
ARM: Adaptive Reasoning Model
by: Wu, Siye, et al.
Published: (2025)
by: Wu, Siye, et al.
Published: (2025)
WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints
by: Wang, Zexuan, et al.
Published: (2026)
by: Wang, Zexuan, et al.
Published: (2026)
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
by: Jiang, Song, et al.
Published: (2024)
by: Jiang, Song, et al.
Published: (2024)
SurveyAgent: A Conversational System for Personalized and Efficient Research Survey
by: Wang, Xintao, et al.
Published: (2024)
by: Wang, Xintao, et al.
Published: (2024)
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
by: Ni, Hang, et al.
Published: (2025)
by: Ni, Hang, et al.
Published: (2025)
GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning
by: Cheng, Xiang, et al.
Published: (2026)
by: Cheng, Xiang, et al.
Published: (2026)
Past Meets Present: Creating Historical Analogy with Large Language Models
by: Li, Nianqi, et al.
Published: (2024)
by: Li, Nianqi, et al.
Published: (2024)
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation
by: Ran, Yiting, et al.
Published: (2025)
by: Ran, Yiting, et al.
Published: (2025)
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
by: Renze, Matthew, et al.
Published: (2024)
by: Renze, Matthew, et al.
Published: (2024)
Recent Advancement of Emotion Cognition in Large Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?
by: Gu, Zhouhong, et al.
Published: (2024)
by: Gu, Zhouhong, et al.
Published: (2024)
CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning
by: Wu, Siye, et al.
Published: (2026)
by: Wu, Siye, et al.
Published: (2026)
The Effect of Sampling Temperature on Problem Solving in Large Language Models
by: Renze, Matthew, et al.
Published: (2024)
by: Renze, Matthew, et al.
Published: (2024)
Ask-before-Plan: Proactive Language Agents for Real-World Planning
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
Curse of Knowledge: When Complex Evaluation Context Benefits yet Biases LLM Judges
by: Li, Weiyuan, et al.
Published: (2025)
by: Li, Weiyuan, et al.
Published: (2025)
Similar Items
-
Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents
by: Oh, Juhyun, et al.
Published: (2025) -
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
by: Xie, Jian, et al.
Published: (2023) -
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
by: Wu, Siye, et al.
Published: (2024) -
TravelAgent: An AI Assistant for Personalized Travel Planning
by: Chen, Aili, et al.
Published: (2024) -
Revealing the Barriers of Language Agents in Planning
by: Xie, Jian, et al.
Published: (2024)