Saved in:
| Main Authors: | Chen, Zhuang, Wu, Jincenzi, Zhou, Jinfeng, Wen, Bosi, Bi, Guanqun, Jiang, Gongyao, Cao, Yaru, Hu, Mengting, Lai, Yunghwei, Xiong, Zexuan, Huang, Minlie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.15052 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SocialSim: Towards Socialized Simulation of Emotional Support Conversation
by: Chen, Zhuang, et al.
Published: (2025)
by: Chen, Zhuang, et al.
Published: (2025)
COKE: A Cognitive Knowledge Graph for Machine Theory of Mind
by: Wu, Jincenzi, et al.
Published: (2023)
by: Wu, Jincenzi, et al.
Published: (2023)
CharacterBench: Benchmarking Character Customization of Large Language Models
by: Zhou, Jinfeng, et al.
Published: (2024)
by: Zhou, Jinfeng, et al.
Published: (2024)
UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
by: Thiyagarajan, Prameshwar, et al.
Published: (2025)
by: Thiyagarajan, Prameshwar, et al.
Published: (2025)
SS-GEN: A Social Story Generation Framework with Large Language Models
by: Feng, Yi, et al.
Published: (2024)
by: Feng, Yi, et al.
Published: (2024)
T2MBench: A Benchmark for Out-of-Distribution Text-to-Motion Generation
by: Yang, Bin, et al.
Published: (2026)
by: Yang, Bin, et al.
Published: (2026)
Unveiling the Landscape of Clinical Depression Assessment: From Behavioral Signatures to Psychiatric Reasoning
by: Chen, Zhuang, et al.
Published: (2025)
by: Chen, Zhuang, et al.
Published: (2025)
Beyond Words: Evaluating and Bridging Epistemic Divergence in User-Agent Interaction via Theory of Mind
by: Ruan, Minyuan, et al.
Published: (2026)
by: Ruan, Minyuan, et al.
Published: (2026)
SocialEval: Evaluating Social Intelligence of Large Language Models
by: Zhou, Jinfeng, et al.
Published: (2025)
by: Zhou, Jinfeng, et al.
Published: (2025)
A Group Fairness Lens for Large Language Models
by: Bi, Guanqun, et al.
Published: (2023)
by: Bi, Guanqun, et al.
Published: (2023)
StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning
by: Zhong, Xuanyue, et al.
Published: (2026)
by: Zhong, Xuanyue, et al.
Published: (2026)
PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments
by: Chen, Zhuang, et al.
Published: (2026)
by: Chen, Zhuang, et al.
Published: (2026)
Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering
by: Jiang, Gongyao, et al.
Published: (2025)
by: Jiang, Gongyao, et al.
Published: (2025)
MBench: A Comprehensive Benchmark on Memory Capability for Video World Models
by: Zhang, Shengjun, et al.
Published: (2026)
by: Zhang, Shengjun, et al.
Published: (2026)
RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
by: Feng, Andrew Zhuoer, et al.
Published: (2026)
by: Feng, Andrew Zhuoer, et al.
Published: (2026)
Benchmarking Complex Instruction-Following with Multiple Constraints Composition
by: Wen, Bosi, et al.
Published: (2024)
by: Wen, Bosi, et al.
Published: (2024)
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
by: Wen, Bosi, et al.
Published: (2026)
by: Wen, Bosi, et al.
Published: (2026)
JRE-L: Journalist, Reader, and Editor LLMs in the Loop for Science Journalism for the General Audience
by: Jiang, Gongyao, et al.
Published: (2025)
by: Jiang, Gongyao, et al.
Published: (2025)
LLM-Collaboration on Automatic Science Journalism for the General Audience
by: Jiang, Gongyao, et al.
Published: (2024)
by: Jiang, Gongyao, et al.
Published: (2024)
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
by: Wen, Bosi, et al.
Published: (2025)
by: Wen, Bosi, et al.
Published: (2025)
Patient-Zero: Scaling Synthetic Patient Agents to Real-World Distributions without Real Patient Data
by: Lai, Yunghwei, et al.
Published: (2025)
by: Lai, Yunghwei, et al.
Published: (2025)
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
by: Wang, Xiaolong, et al.
Published: (2025)
by: Wang, Xiaolong, et al.
Published: (2025)
Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models
by: Cao, Guanqun, et al.
Published: (2025)
by: Cao, Guanqun, et al.
Published: (2025)
Position: Theory of Mind Benchmarks are Broken for Large Language Models
by: Riemer, Matthew, et al.
Published: (2024)
by: Riemer, Matthew, et al.
Published: (2024)
PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues
by: Yu, Fangxu, et al.
Published: (2025)
by: Yu, Fangxu, et al.
Published: (2025)
MAGI: Multi-Agent Guided Interview for Psychiatric Assessment
by: Bi, Guanqun, et al.
Published: (2025)
by: Bi, Guanqun, et al.
Published: (2025)
AlignBench: Benchmarking Chinese Alignment of Large Language Models
by: Liu, Xiao, et al.
Published: (2023)
by: Liu, Xiao, et al.
Published: (2023)
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
Mind the Motions: Benchmarking Theory-of-Mind in Everyday Body Language
by: Lee, Seungbeen, et al.
Published: (2025)
by: Lee, Seungbeen, et al.
Published: (2025)
Towards Optimal Learning of Language Models
by: Gu, Yuxian, et al.
Published: (2024)
by: Gu, Yuxian, et al.
Published: (2024)
Theory of Mind in Large Language Models: Assessment and Enhancement
by: Chen, Ruirui, et al.
Published: (2025)
by: Chen, Ruirui, et al.
Published: (2025)
The Dialogue That Heals: A Comprehensive Evaluation of Doctor Agents' Inquiry Capability
by: Gong, Linlu, et al.
Published: (2025)
by: Gong, Linlu, et al.
Published: (2025)
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
by: Lai, Yunghwei, et al.
Published: (2025)
by: Lai, Yunghwei, et al.
Published: (2025)
The Effects of Market Power Discrepancy on Trade Credit Scales: A Paradoxical Perspective of Digitalization
by: Weiqing Wang, et al.
Published: (2025)
by: Weiqing Wang, et al.
Published: (2025)
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
by: Ke, Pei, et al.
Published: (2023)
by: Ke, Pei, et al.
Published: (2023)
Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback
by: Zhu, Shijing, et al.
Published: (2025)
by: Zhu, Shijing, et al.
Published: (2025)
Data Selection via Optimal Control for Language Models
by: Gu, Yuxian, et al.
Published: (2024)
by: Gu, Yuxian, et al.
Published: (2024)
Training Language Model to Critique for Better Refinement
by: Yu, Tianshu, et al.
Published: (2025)
by: Yu, Tianshu, et al.
Published: (2025)
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
by: Qiu, Zexuan, et al.
Published: (2024)
by: Qiu, Zexuan, et al.
Published: (2024)
Enhanced Large Language Models for Effective Screening of Depression and Anxiety
by: Liu, June M., et al.
Published: (2025)
by: Liu, June M., et al.
Published: (2025)
Similar Items
-
SocialSim: Towards Socialized Simulation of Emotional Support Conversation
by: Chen, Zhuang, et al.
Published: (2025) -
COKE: A Cognitive Knowledge Graph for Machine Theory of Mind
by: Wu, Jincenzi, et al.
Published: (2023) -
CharacterBench: Benchmarking Character Customization of Large Language Models
by: Zhou, Jinfeng, et al.
Published: (2024) -
UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs
by: Thiyagarajan, Prameshwar, et al.
Published: (2025) -
SS-GEN: A Social Story Generation Framework with Large Language Models
by: Feng, Yi, et al.
Published: (2024)