Gespeichert in:
| Hauptverfasser: | , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2026
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2601.04745 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866910145197375488 |
|---|---|
| author | Wu, Tingyu Chen, Zhisheng Weng, Ziyan Wang, Shuhe Li, Chenglong Zhang, Shuo Hu, Sen Wu, Silin Lan, Qizhen Wang, Huacan Chen, Ronghao |
| author_facet | Wu, Tingyu Chen, Zhisheng Weng, Ziyan Wang, Shuhe Li, Chenglong Zhang, Shuo Hu, Sen Wu, Silin Lan, Qizhen Wang, Huacan Chen, Ronghao |
| contents | Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_04745 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions Wu, Tingyu Chen, Zhisheng Weng, Ziyan Wang, Shuhe Li, Chenglong Zhang, Shuo Hu, Sen Wu, Silin Lan, Qizhen Wang, Huacan Chen, Ronghao Artificial Intelligence Information Retrieval Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}. |
| title | KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions |
| topic | Artificial Intelligence Information Retrieval |
| url | https://arxiv.org/abs/2601.04745 |