Saved in:
| Main Authors: | Wu, Shirley, Galley, Michel, Peng, Baolin, Cheng, Hao, Li, Gavin, Dou, Yao, Cai, Weixin, Zou, James, Leskovec, Jure, Gao, Jianfeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.00640 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
by: Dou, Yao, et al.
Published: (2025)
by: Dou, Yao, et al.
Published: (2025)
Teaching Language Models to Self-Improve through Interactive Demonstrations
by: Yu, Xiao, et al.
Published: (2023)
by: Yu, Xiao, et al.
Published: (2023)
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models
by: Li, Miaoran, et al.
Published: (2023)
by: Li, Miaoran, et al.
Published: (2023)
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
by: Yu, Xiao, et al.
Published: (2024)
by: Yu, Xiao, et al.
Published: (2024)
Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap
by: Chen, Tianlang, et al.
Published: (2026)
by: Chen, Tianlang, et al.
Published: (2026)
Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents
by: Yu, Xiao, et al.
Published: (2025)
by: Yu, Xiao, et al.
Published: (2025)
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts
by: Wu, Shirley, et al.
Published: (2023)
by: Wu, Shirley, et al.
Published: (2023)
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
by: Yu, Xiao, et al.
Published: (2025)
by: Yu, Xiao, et al.
Published: (2025)
Uncalibrated Reasoning: GRPO Induces Overconfidence for Stochastic Outcomes
by: Bereket, Michael, et al.
Published: (2025)
by: Bereket, Michael, et al.
Published: (2025)
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
by: Ge, Tao, et al.
Published: (2026)
by: Ge, Tao, et al.
Published: (2026)
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
by: Estornell, Andrew, et al.
Published: (2024)
by: Estornell, Andrew, et al.
Published: (2024)
RelGNN: Composite Message Passing for Relational Deep Learning
by: Chen, Tianlang, et al.
Published: (2025)
by: Chen, Tianlang, et al.
Published: (2025)
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
by: Xu, Jialiang, et al.
Published: (2024)
by: Xu, Jialiang, et al.
Published: (2024)
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
by: Wu, Shirley, et al.
Published: (2024)
by: Wu, Shirley, et al.
Published: (2024)
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
by: Wu, Shirley, et al.
Published: (2024)
by: Wu, Shirley, et al.
Published: (2024)
Large Language Models are Good Relational Learners
by: Wu, Fang, et al.
Published: (2025)
by: Wu, Fang, et al.
Published: (2025)
CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis
by: Venkatraman, Saranya, et al.
Published: (2024)
by: Venkatraman, Saranya, et al.
Published: (2024)
Rethinking Interpretability in the Era of Large Language Models
by: Singh, Chandan, et al.
Published: (2024)
by: Singh, Chandan, et al.
Published: (2024)
AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents
by: Gao, Wenbo, et al.
Published: (2026)
by: Gao, Wenbo, et al.
Published: (2026)
Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming
by: Li, Siyuan, et al.
Published: (2025)
by: Li, Siyuan, et al.
Published: (2025)
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
by: Chen, Tianlang, et al.
Published: (2025)
by: Chen, Tianlang, et al.
Published: (2025)
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
by: Huang, Qian, et al.
Published: (2023)
by: Huang, Qian, et al.
Published: (2023)
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents
by: Sun, Haochen, et al.
Published: (2025)
by: Sun, Haochen, et al.
Published: (2025)
CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration
by: Qian, Yiyue, et al.
Published: (2026)
by: Qian, Yiyue, et al.
Published: (2026)
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
by: Yang, Ke, et al.
Published: (2026)
by: Yang, Ke, et al.
Published: (2026)
The Tool Illusion: Rethinking Tool Use in Web Agents
by: Lou, Renze, et al.
Published: (2026)
by: Lou, Renze, et al.
Published: (2026)
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures
by: Dwivedi, Vijay Prakash, et al.
Published: (2025)
by: Dwivedi, Vijay Prakash, et al.
Published: (2025)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
by: Sun, Chung-En, et al.
Published: (2024)
by: Sun, Chung-En, et al.
Published: (2024)
AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling
by: Faisal, Fazle Elahi, et al.
Published: (2026)
by: Faisal, Fazle Elahi, et al.
Published: (2026)
GLEAN: Active Generalized Category Discovery with Diverse LLM Feedback
by: Zou, Henry Peng, et al.
Published: (2025)
by: Zou, Henry Peng, et al.
Published: (2025)
Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global Evolution
by: Wu, Tailin, et al.
Published: (2024)
by: Wu, Tailin, et al.
Published: (2024)
TimeGraphs: Graph-based Temporal Reasoning
by: Maheshwari, Paridhi, et al.
Published: (2024)
by: Maheshwari, Paridhi, et al.
Published: (2024)
Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting
by: Chang, Serina, et al.
Published: (2024)
by: Chang, Serina, et al.
Published: (2024)
Learning over Positive and Negative Edges with Contrastive Message Passing
by: Pao-Huang, Peter, et al.
Published: (2026)
by: Pao-Huang, Peter, et al.
Published: (2026)
SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
by: Bohus, Dan, et al.
Published: (2025)
by: Bohus, Dan, et al.
Published: (2025)
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
by: Zheng, Jiamu, et al.
Published: (2024)
by: Zheng, Jiamu, et al.
Published: (2024)
Test-Time Learning with an Evolving Library
by: Xu, Weijia, et al.
Published: (2026)
by: Xu, Weijia, et al.
Published: (2026)
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
HumanLM: Simulating Users with State Alignment Beats Response Imitation
by: Wu, Shirley, et al.
Published: (2026)
by: Wu, Shirley, et al.
Published: (2026)
Learning Efficient Positional Encodings with Graph Neural Networks
by: Kanatsoulis, Charilaos I., et al.
Published: (2025)
by: Kanatsoulis, Charilaos I., et al.
Published: (2025)
Similar Items
-
SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
by: Dou, Yao, et al.
Published: (2025) -
Teaching Language Models to Self-Improve through Interactive Demonstrations
by: Yu, Xiao, et al.
Published: (2023) -
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models
by: Li, Miaoran, et al.
Published: (2023) -
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
by: Yu, Xiao, et al.
Published: (2024) -
Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap
by: Chen, Tianlang, et al.
Published: (2026)