Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.07273 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866918283426398208 |
|---|---|
| author | Min, Chen Li, Chengyang Kong, Fanjie Zhu, Qi Zhao, Dawei Xiao, Liang |
| author_facet | Min, Chen Li, Chengyang Kong, Fanjie Zhu, Qi Zhao, Dawei Xiao, Liang |
| contents | This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_07273 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection Min, Chen Li, Chengyang Kong, Fanjie Zhu, Qi Zhao, Dawei Xiao, Liang Computer Vision and Pattern Recognition This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods. |
| title | GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2601.07273 |