Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Min, Chen, Li, Chengyang, Kong, Fanjie, Zhu, Qi, Zhao, Dawei, Xiao, Liang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.07273
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918283426398208
author	Min, Chen Li, Chengyang Kong, Fanjie Zhu, Qi Zhao, Dawei Xiao, Liang
author_facet	Min, Chen Li, Chengyang Kong, Fanjie Zhu, Qi Zhao, Dawei Xiao, Liang
contents	This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_07273
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection Min, Chen Li, Chengyang Kong, Fanjie Zhu, Qi Zhao, Dawei Xiao, Liang Computer Vision and Pattern Recognition This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments demonstrate that GenDet achieves competitive accuracy compared to discriminative detectors, while retaining the flexibility characteristic of generative methods.
title	GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.07273

Similar Items