:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Ruiyi, Yu, Haofei, Zhang, Wenxin, Qi, Zhengyang, Sap, Maarten, Neubig, Graham, Bisk, Yonatan, Zhu, Hao
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2403.08715
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
by: Zhou, Xuhui, et al.
Published: (2023)

LIFELONG SOTOPIA: Evaluating Social Intelligence of Language Agents Over Lifelong Social Interactions
by: Goel, Hitesh, et al.
Published: (2025)

Stereotype or Personalization? User Identity Biases Chatbot Recommendations
by: Kantharuban, Anjali, et al.
Published: (2024)

SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
by: YS, Yashwanth, et al.
Published: (2026)

Language Models Need Inductive Biases to Count Inductively
by: Chang, Yingshan, et al.
Published: (2024)

Training Proactive and Personalized LLM Agents
by: Sun, Weiwei, et al.
Published: (2025)

Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)

Sotopia-RL: Reward Design for Social Intelligence
by: Yu, Haofei, et al.
Published: (2025)

Gradient Localization Improves Lifelong Pretraining of Language Models
by: Fernandez, Jared, et al.
Published: (2024)

SOTOPIA-$Ω$: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents
by: Zhang, Wenyuan, et al.
Published: (2025)

SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers
by: Xuan, Keyang, et al.
Published: (2026)

WebArena: A Realistic Web Environment for Building Autonomous Agents
by: Zhou, Shuyan, et al.
Published: (2023)

When Should AI Read the Room? Public Perceptions of Social Intelligence in AI Agents
by: Mathur, Leena, et al.
Published: (2026)

SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
by: Fan, Xianzhe, et al.
Published: (2025)

Go-Browse: Training Web Agents with Structured Exploration
by: Gandhi, Apurva, et al.
Published: (2025)

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
by: Zhou, Xuhui, et al.
Published: (2024)

Effective Strategies for Asynchronous Software Engineering Agents
by: Geng, Jiayi, et al.
Published: (2026)

Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025)

MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts
by: Yu, Haofei, et al.
Published: (2023)

What Are Tools Anyway? A Survey from the Language Model Perspective
by: Wang, Zhiruo, et al.
Published: (2024)

BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models
by: Tjuatja, Lindia, et al.
Published: (2025)

An Incomplete Loop: Instruction Inference, Instruction Following, and In-context Learning in Language Models
by: Liu, Emmy, et al.
Published: (2024)

TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
by: Zhang, Charlie, et al.
Published: (2025)

Agent Workflow Memory
by: Wang, Zora Zhiruo, et al.
Published: (2024)

Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation
by: Korekata, Ryosuke, et al.
Published: (2025)

Energy Considerations of Large Language Model Inference and Efficiency Optimizations
by: Fernandez, Jared, et al.
Published: (2025)

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents
by: Yu, Haofei, et al.
Published: (2025)

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
by: Chern, Steffi, et al.
Published: (2024)

AutoPresent: Designing Structured Visuals from Scratch
by: Ge, Jiaxin, et al.
Published: (2025)

Looking beyond the next token
by: Thankaraj, Abitha, et al.
Published: (2025)

Tools Fail: Detecting Silent Errors in Faulty Tools
by: Sun, Jimin, et al.
Published: (2024)

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
by: Mendelsohn, Julia, et al.
Published: (2023)

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
by: Baheti, Ashutosh, et al.
Published: (2023)

Learning Model Successors
by: Chang, Yingshan, et al.
Published: (2025)

NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models
by: Rao, Abhinav, et al.
Published: (2024)

Beyond Facts: Evaluating Intent Hallucination in Large Language Models
by: Hao, Yijie, et al.
Published: (2025)

Beyond Browsing: API-Based Web Agents
by: Song, Yueqi, et al.
Published: (2024)

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
by: Zhou, Kaitlyn, et al.
Published: (2024)