Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rouhi, Amirreza, Arezoomandan, Solmaz, Peterson, Knut, Woods, Joseph T., Han, David K.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.08968
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918053422301184
author	Rouhi, Amirreza Arezoomandan, Solmaz Peterson, Knut Woods, Joseph T. Han, David K.
author_facet	Rouhi, Amirreza Arezoomandan, Solmaz Peterson, Knut Woods, Joseph T. Han, David K.
contents	Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_08968
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations Rouhi, Amirreza Arezoomandan, Solmaz Peterson, Knut Woods, Joseph T. Han, David K. Computer Vision and Pattern Recognition Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.
title	ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.08968

Similar Items