Saved in:
Bibliographic Details
Main Authors: Min, Chen, Li, Chengyang, Kong, Fanjie, Zhu, Qi, Zhao, Dawei, Xiao, Liang
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.07273
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918283426398208
author Min, Chen
Li, Chengyang
Kong, Fanjie
Zhu, Qi
Zhao, Dawei
Xiao, Liang
author_facet Min, Chen
Li, Chengyang
Kong, Fanjie
Zhu, Qi
Zhao, Dawei
Xiao, Liang
contents This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods.
format Preprint
id arxiv_https___arxiv_org_abs_2601_07273
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection
Min, Chen
Li, Chengyang
Kong, Fanjie
Zhu, Qi
Zhao, Dawei
Xiao, Liang
Computer Vision and Pattern Recognition
This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods.
title GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2601.07273