:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Agarwal, Lakshita, Verma, Bindu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.16788
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
by: Agarwal, Lakshita, et al.
Published: (2025)

Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism
by: Agarwal, Lakshita, et al.
Published: (2025)

Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time
by: Masala, Mihai, et al.
Published: (2025)

DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
by: Gao, Yifeng, et al.
Published: (2025)

MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation
by: Chen, Huangwei, et al.
Published: (2025)

Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI
by: Qari, Shomukh, et al.
Published: (2025)

XMACNet: An Explainable Lightweight Attention based CNN with Multi Modal Fusion for Chili Disease Classification
by: Ray, Tapon Kumer, et al.
Published: (2026)

Towards Multi-Task Multi-Modal Models: A Video Generative Perspective
by: Yu, Lijun
Published: (2024)

Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
by: Sun, Bohang, et al.
Published: (2025)

Federated Transformer-GNN for Privacy-Preserving Brain Tumor Localization with Modality-Level Explainability
by: Protani, Andrea, et al.
Published: (2026)

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)

MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment
by: Bie, Yequan, et al.
Published: (2024)

REVEAL: Reasoning-Enhanced Forensic Evidence Analysis for Explainable AI-Generated Image Detection
by: Cao, Huangsen, et al.
Published: (2025)

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection
by: Tsigos, Konstantinos, et al.
Published: (2024)

Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics
by: Gotin, Georgii, et al.
Published: (2025)

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
by: Pellicer, Alvaro Lopez, et al.
Published: (2025)

A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
by: Astolfi, Giacomo, et al.
Published: (2026)

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
by: Hu, Teng, et al.
Published: (2025)

MIRAGE: Towards AI-Generated Image Detection in the Wild
by: Xia, Cheng, et al.
Published: (2025)

Explainable, Multi-modal Wound Infection Classification from Images Augmented with Generated Captions
by: Busaranuvong, Palawat, et al.
Published: (2025)

Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers
by: Tariq, Syed Ali, et al.
Published: (2025)

PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates
by: Shi, Junjie, et al.
Published: (2024)

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
by: Zhang, Fanrui, et al.
Published: (2025)

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
by: Fan, Fanda, et al.
Published: (2024)

Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
by: Rawal, Ishaan Singh, et al.
Published: (2023)

Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization
by: Khan, Anas Anwarul Haq, et al.
Published: (2025)

Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering
by: Chagahi, Mehdi Hosseini, et al.
Published: (2024)

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
by: Chen, Sishuo, et al.
Published: (2024)

DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
by: Park, Jun-Hyung, et al.
Published: (2024)

Self-Corrected Image Generation with Explainable Latent Rewards
by: Luo, Yinyi, et al.
Published: (2026)

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
by: Ding, Yanbo, et al.
Published: (2024)

Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI
by: Zaigrajew, Vladimir, et al.
Published: (2024)

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
by: Marmon, Andrew, et al.
Published: (2024)

Robust Multiple Description Neural Video Codec with Masked Transformer for Dynamic and Noisy Networks
by: Hu, Xinyue, et al.
Published: (2024)

Sensor-Adaptive Flood Mapping with Pre-trained Multi-Modal Transformers across SAR and Multispectral Modalities
by: Tanaka, Tomohiro, et al.
Published: (2025)

Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs
by: Pandey, Ananya, et al.
Published: (2024)

Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
by: Jiang, Changjiang, et al.
Published: (2025)

On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification
by: Klotz, Jonas, et al.
Published: (2025)

Multi-language Video Subtitle Dataset for Image-based Text Recognition
by: Singkhornart, Thanadol, et al.
Published: (2024)

Exploring Explainability in Video Action Recognition
by: Saha, Avinab, et al.
Published: (2024)