:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Anonto, Riad Ahmed, Zabin, Sardar Md. Saffat, Rahman, M. Saifur
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.18369
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection
by: Shanto, MD Sadik Hossain, et al.
Published: (2025)

Two Decades of Bengali Handwritten Digit Recognition: A Survey
by: Rahman, A. B. M. Ashikur, et al.
Published: (2022)

BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks
by: Islam, Md. Rakibul, et al.
Published: (2025)

Pay Attention to Where You Looked
by: Berian, Alex, et al.
Published: (2026)

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention
by: Zhao, Jianfei, et al.
Published: (2025)

Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
by: Tu, Yunbin, et al.
Published: (2024)

BdSL-SPOTER: A Transformer-Based Framework for Bengali Sign Language Recognition with Cultural Adaptation
by: Azad, Sayad Ibna, et al.
Published: (2025)

Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search
by: Tan, Lei, et al.
Published: (2024)

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
by: Ye, Qinghao, et al.
Published: (2025)

Guided Attention for Interpretable Motion Captioning
by: Radouane, Karim, et al.
Published: (2023)

Leveraging Complementary Attention maps in vision transformers for OCT image analysis
by: Shahgir, Haz Sameen, et al.
Published: (2023)

PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition
by: Alam, Md. Zarif Ul, et al.
Published: (2023)

An Efficient Dual-Line Decoder Network with Multi-Scale Convolutional Attention for Multi-organ Segmentation
by: Hassan, Riad, et al.
Published: (2025)

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance
by: Teja, L. D. M. S. Sai, et al.
Published: (2025)

Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models
by: Samin, Niamul Hassan, et al.
Published: (2026)

SSTAF: Spatial-Spectral-Temporal Attention Fusion Transformer for Motor Imagery Classification
by: Muna, Ummay Maria, et al.
Published: (2025)

A light-weight model to generate NDWI from Sentinel-1
by: Ahmed, Saleh Sakib, et al.
Published: (2025)

Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR
by: Lu, Jinda, et al.
Published: (2026)

Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model
by: Miah, Abu Saleh Musa, et al.
Published: (2024)

When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals
by: Anonto, Riad Ahmed, et al.
Published: (2025)

Impact of Tuning Parameters in Deep Convolutional Neural Network Using a Crack Image Dataset
by: Zabin, Mahe, et al.
Published: (2025)

Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types
by: Rabby, AKM Shahariar Azad, et al.
Published: (2024)

Compressed Image Captioning using CNN-based Encoder-Decoder Framework
by: Ridoy, Md Alif Rahman, et al.
Published: (2024)

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
by: Fuller, Anthony, et al.
Published: (2025)

GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer
by: Hasan, Md. Mahmudul, et al.
Published: (2025)

Representation Alignment Contrastive Regularization for Multi-Object Tracking
by: Liu, Zhonglin, et al.
Published: (2024)

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
by: Bianchi, Lorenzo, et al.
Published: (2025)

Adaptive Enhancement and Dual-Pooling Sequential Attention for Lightweight Underwater Object Detection with YOLOv10
by: Rahman, Md. Mushibur, et al.
Published: (2026)

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
by: Song, Zijie, et al.
Published: (2023)

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
by: Ma, Ziping, et al.
Published: (2024)

Enhancing Cross-Patient Generalization in AI-Based Parkinson s Disease Detection
by: Albani, Mhd Adnan, et al.
Published: (2025)

Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
by: Tevissen, Yannis, et al.
Published: (2024)

CA-IDD: Cross-Attention Guided Identity-Conditional Diffusion for Identity-Consistent Face Swapping
by: Rana, Md Shohel, et al.
Published: (2026)

PatchAlign3D: Local Feature Alignment for Dense 3D Shape understanding
by: Hadgi, Souhail, et al.
Published: (2026)

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
by: Chowdhury, Md Abtahi Majeed, et al.
Published: (2025)

Learning to Look: Cognitive Attention Alignment with Vision-Language Models
by: Yang, Ryan L., et al.
Published: (2025)

The Art of Saying "Maybe": A Conformal Lens for Uncertainty Benchmarking in VLMs
by: Azad, Asif, et al.
Published: (2025)

MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation
by: Yang, Zhiwei, et al.
Published: (2024)

Cross Modification Attention Based Deliberation Model for Image Captioning
by: Lian, Zheng, et al.
Published: (2021)

Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction
by: Jing, Liu, et al.
Published: (2024)