Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tian, Motong, Wong, Allen P., Mao, Mingjun, Zhou, Wangchunshu
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.14857
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908779632656384
author	Tian, Motong Wong, Allen P. Mao, Mingjun Zhou, Wangchunshu
author_facet	Tian, Motong Wong, Allen P. Mao, Mingjun Zhou, Wangchunshu
contents	Memory-augmented language agents rely on embedding models for effective memory retrieval. However, existing training data construction overlooks a critical limitation: the hierarchical difficulty of negative samples and their natural distribution in human-agent interactions. In practice, some negatives are semantically close distractors while others are trivially irrelevant, and natural dialogue exhibits structured proportions of these types. Current approaches using synthetic or uniformly sampled negatives fail to reflect this diversity, limiting embedding models' ability to learn nuanced discrimination essential for robust memory retrieval. In this work, we propose a principled data construction framework HiNS that explicitly models negative sample difficulty tiers and incorporates empirically grounded negative ratios derived from conversational data, enabling the training of embedding models with substantially improved retrieval fidelity and generalization in memory-intensive tasks. Experiments show significant improvements: on LoCoMo, F1/BLEU-1 gains of 3.27%/3.30%(MemoryOS) and 1.95%/1.78% (Mem0); on PERSONAMEM, total score improvements of 1.19% (MemoryOS) and 2.55% (Mem0).
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_14857
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model Tian, Motong Wong, Allen P. Mao, Mingjun Zhou, Wangchunshu Computation and Language Memory-augmented language agents rely on embedding models for effective memory retrieval. However, existing training data construction overlooks a critical limitation: the hierarchical difficulty of negative samples and their natural distribution in human-agent interactions. In practice, some negatives are semantically close distractors while others are trivially irrelevant, and natural dialogue exhibits structured proportions of these types. Current approaches using synthetic or uniformly sampled negatives fail to reflect this diversity, limiting embedding models' ability to learn nuanced discrimination essential for robust memory retrieval. In this work, we propose a principled data construction framework HiNS that explicitly models negative sample difficulty tiers and incorporates empirically grounded negative ratios derived from conversational data, enabling the training of embedding models with substantially improved retrieval fidelity and generalization in memory-intensive tasks. Experiments show significant improvements: on LoCoMo, F1/BLEU-1 gains of 3.27%/3.30%(MemoryOS) and 1.95%/1.78% (Mem0); on PERSONAMEM, total score improvements of 1.19% (MemoryOS) and 2.55% (Mem0).
title	HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model
topic	Computation and Language
url	https://arxiv.org/abs/2601.14857

Similar Items