Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.08968 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866918053422301184 |
|---|---|
| author | Rouhi, Amirreza Arezoomandan, Solmaz Peterson, Knut Woods, Joseph T. Han, David K. |
| author_facet | Rouhi, Amirreza Arezoomandan, Solmaz Peterson, Knut Woods, Joseph T. Han, David K. |
| contents | Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2506_08968 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations Rouhi, Amirreza Arezoomandan, Solmaz Peterson, Knut Woods, Joseph T. Han, David K. Computer Vision and Pattern Recognition Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates repository labels using visual cohesion analysis and k-nearest-neighbor-based majority re-labeling. Experimental results on the COCO and PASCAL datasets demonstrate that ADAM effectively annotates novel categories using only visual and contextual signals, without requiring any fine-tuning or retraining. |
| title | ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2506.08968 |