Saved in:
Bibliographic Details
Main Authors: Yuan, Jin, Chen, Shikai, Zhang, Yao, Shi, Zhongchao, Geng, Xin, Fan, Jianping, Rui, Yong
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2203.04049
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929209013698560
author Yuan, Jin
Chen, Shikai
Zhang, Yao
Shi, Zhongchao
Geng, Xin
Fan, Jianping
Rui, Yong
author_facet Yuan, Jin
Chen, Shikai
Zhang, Yao
Shi, Zhongchao
Geng, Xin
Fan, Jianping
Rui, Yong
contents Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.
format Preprint
id arxiv_https___arxiv_org_abs_2203_04049
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Graph Attention Transformer Network for Multi-Label Image Classification
Yuan, Jin
Chen, Shikai
Zhang, Yao
Shi, Zhongchao
Geng, Xin
Fan, Jianping
Rui, Yong
Computer Vision and Pattern Recognition
Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.
title Graph Attention Transformer Network for Multi-Label Image Classification
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2203.04049