Saved in:
| Main Authors: | Wang, Ruiyi, Yu, Haofei, Zhang, Wenxin, Qi, Zhengyang, Sap, Maarten, Neubig, Graham, Bisk, Yonatan, Zhu, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.08715 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
by: Zhou, Xuhui, et al.
Published: (2023)
by: Zhou, Xuhui, et al.
Published: (2023)
LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions
by: Goel, Hitesh, et al.
Published: (2025)
by: Goel, Hitesh, et al.
Published: (2025)
Stereotype or Personalization? User Identity Biases Chatbot Recommendations
by: Kantharuban, Anjali, et al.
Published: (2024)
by: Kantharuban, Anjali, et al.
Published: (2024)
SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
by: YS, Yashwanth, et al.
Published: (2026)
by: YS, Yashwanth, et al.
Published: (2026)
Language Models Need Inductive Biases to Count Inductively
by: Chang, Yingshan, et al.
Published: (2024)
by: Chang, Yingshan, et al.
Published: (2024)
Training Proactive and Personalized LLM Agents
by: Sun, Weiwei, et al.
Published: (2025)
by: Sun, Weiwei, et al.
Published: (2025)
Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
Sotopia-RL: Reward Design for Social Intelligence
by: Yu, Haofei, et al.
Published: (2025)
by: Yu, Haofei, et al.
Published: (2025)
Gradient Localization Improves Lifelong Pretraining of Language Models
by: Fernandez, Jared, et al.
Published: (2024)
by: Fernandez, Jared, et al.
Published: (2024)
SOTOPIA-$Ω$: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents
by: Zhang, Wenyuan, et al.
Published: (2025)
by: Zhang, Wenyuan, et al.
Published: (2025)
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers
by: Xuan, Keyang, et al.
Published: (2026)
by: Xuan, Keyang, et al.
Published: (2026)
WebArena: A Realistic Web Environment for Building Autonomous Agents
by: Zhou, Shuyan, et al.
Published: (2023)
by: Zhou, Shuyan, et al.
Published: (2023)
When Should AI Read the Room? Public Perceptions of Social Intelligence in AI Agents
by: Mathur, Leena, et al.
Published: (2026)
by: Mathur, Leena, et al.
Published: (2026)
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
by: Fan, Xianzhe, et al.
Published: (2025)
by: Fan, Xianzhe, et al.
Published: (2025)
Go-Browse: Training Web Agents with Structured Exploration
by: Gandhi, Apurva, et al.
Published: (2025)
by: Gandhi, Apurva, et al.
Published: (2025)
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
by: Zhou, Xuhui, et al.
Published: (2024)
by: Zhou, Xuhui, et al.
Published: (2024)
Effective Strategies for Asynchronous Software Engineering Agents
by: Geng, Jiayi, et al.
Published: (2026)
by: Geng, Jiayi, et al.
Published: (2026)
Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025)
by: Zhu, Yiqi, et al.
Published: (2025)
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts
by: Yu, Haofei, et al.
Published: (2023)
by: Yu, Haofei, et al.
Published: (2023)
What Are Tools Anyway? A Survey from the Language Model Perspective
by: Wang, Zhiruo, et al.
Published: (2024)
by: Wang, Zhiruo, et al.
Published: (2024)
BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
by: Tjuatja, Lindia, et al.
Published: (2025)
by: Tjuatja, Lindia, et al.
Published: (2025)
An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models
by: Liu, Emmy, et al.
Published: (2024)
by: Liu, Emmy, et al.
Published: (2024)
TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)
by: Zhou, Xuhui, et al.
Published: (2025)
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
by: Zhang, Charlie, et al.
Published: (2025)
by: Zhang, Charlie, et al.
Published: (2025)
Agent Workflow Memory
by: Wang, Zora Zhiruo, et al.
Published: (2024)
by: Wang, Zora Zhiruo, et al.
Published: (2024)
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)
by: Korekata, Ryosuke, et al.
Published: (2025)
Energy Considerations of Large Language Model Inference and Efficiency Optimizations
by: Fernandez, Jared, et al.
Published: (2025)
by: Fernandez, Jared, et al.
Published: (2025)
TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents
by: Yu, Haofei, et al.
Published: (2025)
by: Yu, Haofei, et al.
Published: (2025)
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
by: Chern, Steffi, et al.
Published: (2024)
by: Chern, Steffi, et al.
Published: (2024)
AutoPresent: Designing Structured Visuals from Scratch
by: Ge, Jiaxin, et al.
Published: (2025)
by: Ge, Jiaxin, et al.
Published: (2025)
Looking beyond the next token
by: Thankaraj, Abitha, et al.
Published: (2025)
by: Thankaraj, Abitha, et al.
Published: (2025)
Tools Fail: Detecting Silent Errors in Faulty Tools
by: Sun, Jimin, et al.
Published: (2024)
by: Sun, Jimin, et al.
Published: (2024)
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
by: Mendelsohn, Julia, et al.
Published: (2023)
by: Mendelsohn, Julia, et al.
Published: (2023)
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
by: Baheti, Ashutosh, et al.
Published: (2023)
by: Baheti, Ashutosh, et al.
Published: (2023)
Learning Model Successors
by: Chang, Yingshan, et al.
Published: (2025)
by: Chang, Yingshan, et al.
Published: (2025)
NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models
by: Rao, Abhinav, et al.
Published: (2024)
by: Rao, Abhinav, et al.
Published: (2024)
Beyond Facts: Evaluating Intent Hallucination in Large Language Models
by: Hao, Yijie, et al.
Published: (2025)
by: Hao, Yijie, et al.
Published: (2025)
Beyond Browsing: API-Based Web Agents
by: Song, Yueqi, et al.
Published: (2024)
by: Song, Yueqi, et al.
Published: (2024)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
by: Zhou, Kaitlyn, et al.
Published: (2024)
by: Zhou, Kaitlyn, et al.
Published: (2024)
Similar Items
-
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
by: Zhou, Xuhui, et al.
Published: (2023) -
LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions
by: Goel, Hitesh, et al.
Published: (2025) -
Stereotype or Personalization? User Identity Biases Chatbot Recommendations
by: Kantharuban, Anjali, et al.
Published: (2024) -
SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
by: YS, Yashwanth, et al.
Published: (2026) -
Language Models Need Inductive Biases to Count Inductively
by: Chang, Yingshan, et al.
Published: (2024)