Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yuan, Jin, Chen, Shikai, Zhang, Yao, Shi, Zhongchao, Geng, Xin, Fan, Jianping, Rui, Yong
Format:	Preprint
Published:	2022
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2203.04049
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929209013698560
author	Yuan, Jin Chen, Shikai Zhang, Yao Shi, Zhongchao Geng, Xin Fan, Jianping Rui, Yong
author_facet	Yuan, Jin Chen, Shikai Zhang, Yao Shi, Zhongchao Geng, Xin Fan, Jianping Rui, Yong
contents	Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2203_04049
institution	arXiv
publishDate	2022
record_format	arxiv
spellingShingle	Graph Attention Transformer Network for Multi-Label Image Classification Yuan, Jin Chen, Shikai Zhang, Yao Shi, Zhongchao Geng, Xin Fan, Jianping Rui, Yong Computer Vision and Pattern Recognition Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.
title	Graph Attention Transformer Network for Multi-Label Image Classification
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2203.04049

Similar Items