Saved in:
| Main Author: | Gusev, Ilya |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.06820 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
by: Farhansyah, Mohammad Rifqi, et al.
Published: (2026)
by: Farhansyah, Mohammad Rifqi, et al.
Published: (2026)
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing
by: Xiang, Hao, et al.
Published: (2025)
by: Xiang, Hao, et al.
Published: (2025)
Role-Playing Evaluation for Large Language Models
by: Boudouri, Yassine El, et al.
Published: (2025)
by: Boudouri, Yassine El, et al.
Published: (2025)
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models
by: Wang, Jiayin, et al.
Published: (2024)
by: Wang, Jiayin, et al.
Published: (2024)
Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction
by: Alberico, Ivan, et al.
Published: (2025)
by: Alberico, Ivan, et al.
Published: (2025)
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
by: Wang, Zekun Moore, et al.
Published: (2023)
by: Wang, Zekun Moore, et al.
Published: (2023)
MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
by: Eisenstein, Jacob, et al.
Published: (2026)
by: Eisenstein, Jacob, et al.
Published: (2026)
RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines
by: Yu, Pengfei, et al.
Published: (2025)
by: Yu, Pengfei, et al.
Published: (2025)
Parallelize Over Data Particle Advection: Participation, Ping Pong Particles, and Overhead
by: Wang, Zhe, et al.
Published: (2024)
by: Wang, Zhe, et al.
Published: (2024)
Rehearse With User: Personalized Opinion Summarization via Role-Playing based on Large Language Models
by: Zhang, Yanyue, et al.
Published: (2025)
by: Zhang, Yanyue, et al.
Published: (2025)
RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models
by: Shen, Tianhao, et al.
Published: (2023)
by: Shen, Tianhao, et al.
Published: (2023)
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents
by: Jiang, Changhao, et al.
Published: (2025)
by: Jiang, Changhao, et al.
Published: (2025)
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
by: Tu, Quan, et al.
Published: (2024)
by: Tu, Quan, et al.
Published: (2024)
Evaluating Language Translation Models by Playing Telephone
by: Saba, Syeda Jannatus, et al.
Published: (2025)
by: Saba, Syeda Jannatus, et al.
Published: (2025)
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
by: Ahn, Jaewoo, et al.
Published: (2024)
by: Ahn, Jaewoo, et al.
Published: (2024)
VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents
by: Wu, Weihao, et al.
Published: (2025)
by: Wu, Weihao, et al.
Published: (2025)
Ping‐Pong Gaze in Sporadic Creutzfeldt‐Jakob Disease
by: Yuan Yang, et al.
Published: (2024)
by: Yuan Yang, et al.
Published: (2024)
Controlling Summarization Length Through EOS Token Weighting
by: Belligoli, Zeno, et al.
Published: (2025)
by: Belligoli, Zeno, et al.
Published: (2025)
DEBATE: A Large-Scale Benchmark for Evaluating Opinion Dynamics in Role-Playing LLM Agents
by: Chuang, Yun-Shiuan, et al.
Published: (2025)
by: Chuang, Yun-Shiuan, et al.
Published: (2025)
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
by: Chen, Nuo, et al.
Published: (2024)
by: Chen, Nuo, et al.
Published: (2024)
Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
by: Chen, Chaoran, et al.
Published: (2025)
by: Chen, Chaoran, et al.
Published: (2025)
Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction
by: Lim, Jung Hoon, et al.
Published: (2024)
by: Lim, Jung Hoon, et al.
Published: (2024)
RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following
by: Lu, Junru, et al.
Published: (2025)
by: Lu, Junru, et al.
Published: (2025)
On the Decision-Making Abilities in Role-Playing using Large Language Models
by: Shen, Chenglei, et al.
Published: (2024)
by: Shen, Chenglei, et al.
Published: (2024)
Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
by: Su, Yuchen, et al.
Published: (2026)
by: Su, Yuchen, et al.
Published: (2026)
RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models
by: Tao, Meiling, et al.
Published: (2023)
by: Tao, Meiling, et al.
Published: (2023)
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
by: Peng, Ji-Lun, et al.
Published: (2026)
by: Peng, Ji-Lun, et al.
Published: (2026)
User Profile with Large Language Models: Construction, Updating, and Benchmarking
by: Prottasha, Nusrat Jahan, et al.
Published: (2025)
by: Prottasha, Nusrat Jahan, et al.
Published: (2025)
HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing
by: Chen, Jing, et al.
Published: (2024)
by: Chen, Jing, et al.
Published: (2024)
Role-Play Paradox in Large Language Models: Reasoning Performance Gains and Ethical Dilemmas
by: Zhao, Jinman, et al.
Published: (2024)
by: Zhao, Jinman, et al.
Published: (2024)
Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
by: Costa, Davi Bastos, et al.
Published: (2025)
by: Costa, Davi Bastos, et al.
Published: (2025)
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study
by: Xu, Liuchang, et al.
Published: (2024)
by: Xu, Liuchang, et al.
Published: (2024)
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
by: Kwan, Wai-Chung, et al.
Published: (2024)
by: Kwan, Wai-Chung, et al.
Published: (2024)
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
by: Lan, Tian, et al.
Published: (2025)
by: Lan, Tian, et al.
Published: (2025)
VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing
by: Xu, Jiacheng, et al.
Published: (2026)
by: Xu, Jiacheng, et al.
Published: (2026)
A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
by: Liu, Jie, et al.
Published: (2024)
by: Liu, Jie, et al.
Published: (2024)
Flipping the Dialogue: Training and Evaluating User Language Models
by: Naous, Tarek, et al.
Published: (2025)
by: Naous, Tarek, et al.
Published: (2025)
Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models
by: Chiyah-Garcia, Javier, et al.
Published: (2024)
by: Chiyah-Garcia, Javier, et al.
Published: (2024)
MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings
by: Corbeil, Jean-Philippe, et al.
Published: (2025)
by: Corbeil, Jean-Philippe, et al.
Published: (2025)
Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data
by: Ran, Yiting, et al.
Published: (2024)
by: Ran, Yiting, et al.
Published: (2024)
Similar Items
-
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
by: Farhansyah, Mohammad Rifqi, et al.
Published: (2026) -
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing
by: Xiang, Hao, et al.
Published: (2025) -
Role-Playing Evaluation for Large Language Models
by: Boudouri, Yassine El, et al.
Published: (2025) -
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models
by: Wang, Jiayin, et al.
Published: (2024) -
Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction
by: Alberico, Ivan, et al.
Published: (2025)