Saved in:
| Main Authors: | Zheng, Siyuan, Liu, Pai, Chen, Xi, Dong, Jizheng, Jia, Sihan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.23337 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Eval4Sim: An Evaluation Framework for Persona Simulation
by: Bao, Eliseo, et al.
Published: (2026)
by: Bao, Eliseo, et al.
Published: (2026)
PersonaTwin: A Multi-Tier Prompt Conditioning Framework for Generating and Evaluating Personalized Digital Twins
by: Chen, Sihan, et al.
Published: (2025)
by: Chen, Sihan, et al.
Published: (2025)
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
by: Zhao, Zheng, et al.
Published: (2025)
by: Zhao, Zheng, et al.
Published: (2025)
Crafting Customisable Characters with LLMs: A Persona-Driven Role-Playing Agent Framework
by: Yang, Bohao, et al.
Published: (2024)
by: Yang, Bohao, et al.
Published: (2024)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
by: Fan, Zhihao, et al.
Published: (2024)
by: Fan, Zhihao, et al.
Published: (2024)
Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark
by: Yang, Funing, et al.
Published: (2024)
by: Yang, Funing, et al.
Published: (2024)
BaziQA-Benchmark: Evaluating Symbolic and Temporally Compositional Reasoning in Large Language Models
by: Chen, Jiangxi, et al.
Published: (2026)
by: Chen, Jiangxi, et al.
Published: (2026)
Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
by: Maiya, Sharan, et al.
Published: (2025)
by: Maiya, Sharan, et al.
Published: (2025)
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
by: Wang, Xiaoyang, et al.
Published: (2025)
by: Wang, Xiaoyang, et al.
Published: (2025)
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
by: Chen, Runjin, et al.
Published: (2025)
by: Chen, Runjin, et al.
Published: (2025)
How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
LaRe: Latent Refocusing for Multimodal Reasoning
by: Ma, Jizheng, et al.
Published: (2025)
by: Ma, Jizheng, et al.
Published: (2025)
ZiGong 1.0: A Large Language Model for Financial Credit
by: Lei, Yu, et al.
Published: (2025)
by: Lei, Yu, et al.
Published: (2025)
AstroMind: A High-Fidelity Benchmark for Spacecraft Behavior Reasoning Based on Large Language Models
by: Liu, Hao, et al.
Published: (2026)
by: Liu, Hao, et al.
Published: (2026)
CharacterBench: Benchmarking Character Customization of Large Language Models
by: Zhou, Jinfeng, et al.
Published: (2024)
by: Zhou, Jinfeng, et al.
Published: (2024)
Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation
by: Lee, Huije, et al.
Published: (2026)
by: Lee, Huije, et al.
Published: (2026)
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
by: Fatemi, Bahare, et al.
Published: (2024)
by: Fatemi, Bahare, et al.
Published: (2024)
Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation
by: Shin, Jisu, et al.
Published: (2025)
by: Shin, Jisu, et al.
Published: (2025)
PersonaMath: Boosting Mathematical Reasoning via Persona-Driven Data Augmentation
by: Luo, Jing, et al.
Published: (2024)
by: Luo, Jing, et al.
Published: (2024)
A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles
by: Chen, Siyuan, et al.
Published: (2024)
by: Chen, Siyuan, et al.
Published: (2024)
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
by: Zeng, Zhongshen, et al.
Published: (2023)
by: Zeng, Zhongshen, et al.
Published: (2023)
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning
by: Chen, Kang, et al.
Published: (2024)
by: Chen, Kang, et al.
Published: (2024)
CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents
by: Park, Jeiyoon, et al.
Published: (2024)
by: Park, Jeiyoon, et al.
Published: (2024)
A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI
by: Karagoz, Atahan
Published: (2026)
by: Karagoz, Atahan
Published: (2026)
Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection
by: Narayanasamy, Priyadarshan, et al.
Published: (2026)
by: Narayanasamy, Priyadarshan, et al.
Published: (2026)
Persona-Based Conversational AI: State of the Art and Challenges
by: Liu, Junfeng, et al.
Published: (2022)
by: Liu, Junfeng, et al.
Published: (2022)
Quantifying the Persona Effect in LLM Simulations
by: Hu, Tiancheng, et al.
Published: (2024)
by: Hu, Tiancheng, et al.
Published: (2024)
LTLBench: Towards Benchmarks for Evaluating Temporal Reasoning in Large Language Models
by: Tang, Weizhi, et al.
Published: (2024)
by: Tang, Weizhi, et al.
Published: (2024)
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
by: Juneja, Prerna, et al.
Published: (2026)
by: Juneja, Prerna, et al.
Published: (2026)
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
by: Chu, Zheng, et al.
Published: (2023)
by: Chu, Zheng, et al.
Published: (2023)
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
by: Bhattacharya, Debarpan, et al.
Published: (2025)
by: Bhattacharya, Debarpan, et al.
Published: (2025)
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
by: Tu, Quan, et al.
Published: (2024)
by: Tu, Quan, et al.
Published: (2024)
TRAVELER: A Benchmark for Evaluating Temporal Reasoning across Vague, Implicit and Explicit References
by: Kenneweg, Svenja, et al.
Published: (2025)
by: Kenneweg, Svenja, et al.
Published: (2025)
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
by: Truong, Kimberly Le, et al.
Published: (2025)
by: Truong, Kimberly Le, et al.
Published: (2025)
Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment
by: Su, Ruoxi, et al.
Published: (2026)
by: Su, Ruoxi, et al.
Published: (2026)
DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition
by: Kim, Siun, et al.
Published: (2026)
by: Kim, Siun, et al.
Published: (2026)
Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
by: Kim, Serin, et al.
Published: (2026)
by: Kim, Serin, et al.
Published: (2026)
Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas
by: Kwok, Louis, et al.
Published: (2024)
by: Kwok, Louis, et al.
Published: (2024)
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
by: Liu, Yuxin, et al.
Published: (2026)
by: Liu, Yuxin, et al.
Published: (2026)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation
by: Du, Bangde, et al.
Published: (2025)
by: Du, Bangde, et al.
Published: (2025)
Similar Items
-
Eval4Sim: An Evaluation Framework for Persona Simulation
by: Bao, Eliseo, et al.
Published: (2026) -
PersonaTwin: A Multi-Tier Prompt Conditioning Framework for Generating and Evaluating Personalized Digital Twins
by: Chen, Sihan, et al.
Published: (2025) -
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
by: Zhao, Zheng, et al.
Published: (2025) -
Crafting Customisable Characters with LLMs: A Persona-Driven Role-Playing Agent Framework
by: Yang, Bohao, et al.
Published: (2024) -
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
by: Fan, Zhihao, et al.
Published: (2024)