Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nan, Xinyu, Mao, Lingtao, Dai, Huangyu, Zheng, Zexin, Sun, Xinyu, Liang, Zihan, Chen, Ben, Ding, Yuqing, Lei, Chenyi, Ou, Wenwu, Li, Han
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.15984
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Achieving visual semantic understanding requires a unified framework that simultaneously handles object detection, category prediction, and attribute recognition. However, current advanced approaches rely on global similarity and struggle to capture fine-grained category distinctions and category-specific attribute diversity, especially in large-scale e-commerce scenarios. To overcome these challenges, we introduce a detection-guided generative framework that predicts hierarchical category and attribute tokens. For each detected object, we extract refined ROI-level features and employ a BART-based generator to produce semantic tokens in a coarse-to-fine sequence covering category hierarchies and property-value pairs, with support for property-conditioned attribute recognition. Experiments on both large-scale proprietary e-commerce datasets and open-source datasets demonstrate that our approach significantly outperforms existing similarity-based pipelines and multi-stage classification systems, achieving stronger fine-grained recognition and more coherent unified inference.

Similar Items