Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Baig, Mirza Samad Ahmed, Gillani, Syeda Anshrah, Khan, Abdul Akbar, Shah, Shahid Munir, Khan, Muhammad Omer
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2504.12088
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916956863463424
author	Baig, Mirza Samad Ahmed Gillani, Syeda Anshrah Khan, Abdul Akbar Shah, Shahid Munir Khan, Muhammad Omer
author_facet	Baig, Mirza Samad Ahmed Gillani, Syeda Anshrah Khan, Abdul Akbar Shah, Shahid Munir Khan, Muhammad Omer
contents	Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech processing. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. In this research, a unified family of stochastic regularization techniques has been proposed, i.e. AttentionDrop with its three different variants, which operate directly on the self-attention distributions. Hard Attention Masking randomly zeroes out top-k attention logits per query to encourage diverse context utilization, Blurred Attention Smoothing applies a dynamic Gaussian convolution over attention logits to diffuse overly peaked distributions, and Consistency-Regularized AttentionDrop enforces output stability under multiple independent AttentionDrop perturbations via a KL-based consistency loss. Results achieved in the study demonstrate that AttentionDrop consistently improves accuracy, calibration, and adversarial robustness over standard Dropout, DropConnect, and R-Drop baselines
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_12088
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AttentionDrop: A Novel Regularization Method for Transformer Models Baig, Mirza Samad Ahmed Gillani, Syeda Anshrah Khan, Abdul Akbar Shah, Shahid Munir Khan, Muhammad Omer Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech processing. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. In this research, a unified family of stochastic regularization techniques has been proposed, i.e. AttentionDrop with its three different variants, which operate directly on the self-attention distributions. Hard Attention Masking randomly zeroes out top-k attention logits per query to encourage diverse context utilization, Blurred Attention Smoothing applies a dynamic Gaussian convolution over attention logits to diffuse overly peaked distributions, and Consistency-Regularized AttentionDrop enforces output stability under multiple independent AttentionDrop perturbations via a KL-based consistency loss. Results achieved in the study demonstrate that AttentionDrop consistently improves accuracy, calibration, and adversarial robustness over standard Dropout, DropConnect, and R-Drop baselines
title	AttentionDrop: A Novel Regularization Method for Transformer Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2504.12088

Similar Items