Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wu, Tingyu, Chen, Zhisheng, Weng, Ziyan, Wang, Shuhe, Li, Chenglong, Zhang, Shuo, Hu, Sen, Wu, Silin, Lan, Qizhen, Wang, Huacan, Chen, Ronghao
Format: Preprint
Veröffentlicht: 2026
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2601.04745
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866910145197375488
author Wu, Tingyu
Chen, Zhisheng
Weng, Ziyan
Wang, Shuhe
Li, Chenglong
Zhang, Shuo
Hu, Sen
Wu, Silin
Lan, Qizhen
Wang, Huacan
Chen, Ronghao
author_facet Wu, Tingyu
Chen, Zhisheng
Weng, Ziyan
Wang, Shuhe
Li, Chenglong
Zhang, Shuo
Hu, Sen
Wu, Silin
Lan, Qizhen
Wang, Huacan
Chen, Ronghao
contents Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.
format Preprint
id arxiv_https___arxiv_org_abs_2601_04745
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
Wu, Tingyu
Chen, Zhisheng
Weng, Ziyan
Wang, Shuhe
Li, Chenglong
Zhang, Shuo
Hu, Sen
Wu, Silin
Lan, Qizhen
Wang, Huacan
Chen, Ronghao
Artificial Intelligence
Information Retrieval
Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.
title KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
topic Artificial Intelligence
Information Retrieval
url https://arxiv.org/abs/2601.04745