Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hosseini, Hesam, Mighan, Ghazal Hosseini, Afzali, Amirabbas, Amini, Sajjad, Houmansadr, Amir
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2411.12589
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ULTra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using LLMs, demonstrating its broad applicability in explaining the semantic structure of latent token representations.

Similar Items