Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Kim, Jeongsoo, Nang, Jongho, Choe, Junsuk
Format:	Preprint
Publié:	2024
Sujets:	Computer Vision and Pattern Recognition Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2409.03516
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909306387955712
author	Kim, Jeongsoo Nang, Jongho Choe, Junsuk
author_facet	Kim, Jeongsoo Nang, Jongho Choe, Junsuk
contents	Recent Vision Transformer (ViT)-based methods for Image Super-Resolution have demonstrated impressive performance. However, they suffer from significant complexity, resulting in high inference times and memory usage. Additionally, ViT models using Window Self-Attention (WSA) face challenges in processing regions outside their windows. To address these issues, we propose the Low-to-high Multi-Level Transformer (LMLT), which employs attention with varying feature sizes for each head. LMLT divides image features along the channel dimension, gradually reduces spatial size for lower heads, and applies self-attention to each head. This approach effectively captures both local and global information. By integrating the results from lower heads into higher heads, LMLT overcomes the window boundary issues in self-attention. Extensive experiments show that our model significantly reduces inference time and GPU memory usage while maintaining or even surpassing the performance of state-of-the-art ViT-based Image Super-Resolution methods. Our codes are availiable at https://github.com/jwgdmkj/LMLT.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_03516
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution Kim, Jeongsoo Nang, Jongho Choe, Junsuk Computer Vision and Pattern Recognition Artificial Intelligence Recent Vision Transformer (ViT)-based methods for Image Super-Resolution have demonstrated impressive performance. However, they suffer from significant complexity, resulting in high inference times and memory usage. Additionally, ViT models using Window Self-Attention (WSA) face challenges in processing regions outside their windows. To address these issues, we propose the Low-to-high Multi-Level Transformer (LMLT), which employs attention with varying feature sizes for each head. LMLT divides image features along the channel dimension, gradually reduces spatial size for lower heads, and applies self-attention to each head. This approach effectively captures both local and global information. By integrating the results from lower heads into higher heads, LMLT overcomes the window boundary issues in self-attention. Extensive experiments show that our model significantly reduces inference time and GPU memory usage while maintaining or even surpassing the performance of state-of-the-art ViT-based Image Super-Resolution methods. Our codes are availiable at https://github.com/jwgdmkj/LMLT.
title	LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2409.03516

Documents similaires