Saved in:
Bibliographic Details
Main Authors: Rouhi, Amirreza, Arezoomandan, Solmaz, Peterson, Knut, Woods, Joseph T., Han, David K.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.08968
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918053422301184
author Rouhi, Amirreza
Arezoomandan, Solmaz
Peterson, Knut
Woods, Joseph T.
Han, David K.
author_facet Rouhi, Amirreza
Arezoomandan, Solmaz
Peterson, Knut
Woods, Joseph T.
Han, David K.
contents Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.
format Preprint
id arxiv_https___arxiv_org_abs_2506_08968
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
Rouhi, Amirreza
Arezoomandan, Solmaz
Peterson, Knut
Woods, Joseph T.
Han, David K.
Computer Vision and Pattern Recognition
Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining.
title ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2506.08968