Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.02539 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- Continual memory augmentation lets computer-using agents (CUAs) learn from prior interactions, but unvetted memories can encode domain-inappropriate or unsafe heuristics--spurious rules that drift from user intent and safety constraints. We introduce VerificAgent, a scalable oversight framework that treats persistent memory as an explicit alignment surface. VerificAgent combines (1) an expert-curated seed of domain knowledge, (2) iterative, trajectory-based memory growth during training, and (3) a post-hoc human fact-checking pass to sanitize accumulated memories before deployment. Evaluated on OSWorld productivity tasks and additional adversarial stress tests, VerificAgent improves task reliability, reduces hallucination-induced failures, and preserves interpretable, auditable guidance--without additional model fine-tuning. By letting humans correct high-impact errors once, the verified memory acts as a frozen safety contract that future agent actions must satisfy. Our results suggest that domain-scoped, human-verified memory offers a scalable oversight mechanism for CUAs, complementing broader alignment strategies by limiting silent policy drift and anchoring agent behavior to the norms and safety constraints of the target domain.