Saved in:
| Main Authors: | Agarwal, Lakshita, Verma, Bindu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.16788 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
by: Agarwal, Lakshita, et al.
Published: (2025)
by: Agarwal, Lakshita, et al.
Published: (2025)
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism
by: Agarwal, Lakshita, et al.
Published: (2025)
by: Agarwal, Lakshita, et al.
Published: (2025)
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time
by: Masala, Mihai, et al.
Published: (2025)
by: Masala, Mihai, et al.
Published: (2025)
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
by: Gao, Yifeng, et al.
Published: (2025)
by: Gao, Yifeng, et al.
Published: (2025)
MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation
by: Chen, Huangwei, et al.
Published: (2025)
by: Chen, Huangwei, et al.
Published: (2025)
Brain Stroke Detection and Classification Using CT Imaging with Transformer Models and Explainable AI
by: Qari, Shomukh, et al.
Published: (2025)
by: Qari, Shomukh, et al.
Published: (2025)
XMACNet: An Explainable Lightweight Attention based CNN with Multi Modal Fusion for Chili Disease Classification
by: Ray, Tapon Kumer, et al.
Published: (2026)
by: Ray, Tapon Kumer, et al.
Published: (2026)
Towards Multi-Task Multi-Modal Models: A Video Generative Perspective
by: Yu, Lijun
Published: (2024)
by: Yu, Lijun
Published: (2024)
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
by: Sun, Bohang, et al.
Published: (2025)
by: Sun, Bohang, et al.
Published: (2025)
Federated Transformer-GNN for Privacy-Preserving Brain Tumor Localization with Modality-Level Explainability
by: Protani, Andrea, et al.
Published: (2026)
by: Protani, Andrea, et al.
Published: (2026)
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)
by: Cai, Minghong, et al.
Published: (2024)
MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment
by: Bie, Yequan, et al.
Published: (2024)
by: Bie, Yequan, et al.
Published: (2024)
REVEAL: Reasoning-Enhanced Forensic Evidence Analysis for Explainable AI-Generated Image Detection
by: Cao, Huangsen, et al.
Published: (2025)
by: Cao, Huangsen, et al.
Published: (2025)
Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection
by: Tsigos, Konstantinos, et al.
Published: (2024)
by: Tsigos, Konstantinos, et al.
Published: (2024)
Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics
by: Gotin, Georgii, et al.
Published: (2025)
by: Gotin, Georgii, et al.
Published: (2025)
ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
by: Pellicer, Alvaro Lopez, et al.
Published: (2025)
by: Pellicer, Alvaro Lopez, et al.
Published: (2025)
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
by: Astolfi, Giacomo, et al.
Published: (2026)
by: Astolfi, Giacomo, et al.
Published: (2026)
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
by: Hu, Teng, et al.
Published: (2025)
by: Hu, Teng, et al.
Published: (2025)
MIRAGE: Towards AI-Generated Image Detection in the Wild
by: Xia, Cheng, et al.
Published: (2025)
by: Xia, Cheng, et al.
Published: (2025)
Explainable, Multi-modal Wound Infection Classification from Images Augmented with Generated Captions
by: Busaranuvong, Palawat, et al.
Published: (2025)
by: Busaranuvong, Palawat, et al.
Published: (2025)
Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers
by: Tariq, Syed Ali, et al.
Published: (2025)
by: Tariq, Syed Ali, et al.
Published: (2025)
PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates
by: Shi, Junjie, et al.
Published: (2024)
by: Shi, Junjie, et al.
Published: (2024)
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
by: Zhang, Fanrui, et al.
Published: (2025)
by: Zhang, Fanrui, et al.
Published: (2025)
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
by: Fan, Fanda, et al.
Published: (2024)
by: Fan, Fanda, et al.
Published: (2024)
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
by: Rawal, Ishaan Singh, et al.
Published: (2023)
by: Rawal, Ishaan Singh, et al.
Published: (2023)
Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization
by: Khan, Anas Anwarul Haq, et al.
Published: (2025)
by: Khan, Anas Anwarul Haq, et al.
Published: (2025)
Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clustering
by: Chagahi, Mehdi Hosseini, et al.
Published: (2024)
by: Chagahi, Mehdi Hosseini, et al.
Published: (2024)
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
by: Chen, Sishuo, et al.
Published: (2024)
by: Chen, Sishuo, et al.
Published: (2024)
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
by: Park, Jun-Hyung, et al.
Published: (2024)
by: Park, Jun-Hyung, et al.
Published: (2024)
Self-Corrected Image Generation with Explainable Latent Rewards
by: Luo, Yinyi, et al.
Published: (2026)
by: Luo, Yinyi, et al.
Published: (2026)
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
by: Ding, Yanbo, et al.
Published: (2024)
by: Ding, Yanbo, et al.
Published: (2024)
Red Teaming Models for Hyperspectral Image Analysis Using Explainable AI
by: Zaigrajew, Vladimir, et al.
Published: (2024)
by: Zaigrajew, Vladimir, et al.
Published: (2024)
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
by: Marmon, Andrew, et al.
Published: (2024)
by: Marmon, Andrew, et al.
Published: (2024)
Robust Multiple Description Neural Video Codec with Masked Transformer for Dynamic and Noisy Networks
by: Hu, Xinyue, et al.
Published: (2024)
by: Hu, Xinyue, et al.
Published: (2024)
Sensor-Adaptive Flood Mapping with Pre-trained Multi-Modal Transformers across SAR and Multispectral Modalities
by: Tanaka, Tomohiro, et al.
Published: (2025)
by: Tanaka, Tomohiro, et al.
Published: (2025)
Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
by: Jiang, Changjiang, et al.
Published: (2025)
by: Jiang, Changjiang, et al.
Published: (2025)
On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification
by: Klotz, Jonas, et al.
Published: (2025)
by: Klotz, Jonas, et al.
Published: (2025)
Multi-language Video Subtitle Dataset for Image-based Text Recognition
by: Singkhornart, Thanadol, et al.
Published: (2024)
by: Singkhornart, Thanadol, et al.
Published: (2024)
Exploring Explainability in Video Action Recognition
by: Saha, Avinab, et al.
Published: (2024)
by: Saha, Avinab, et al.
Published: (2024)
Similar Items
-
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism
by: Agarwal, Lakshita, et al.
Published: (2025) -
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism
by: Agarwal, Lakshita, et al.
Published: (2025) -
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time
by: Masala, Mihai, et al.
Published: (2025) -
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
by: Gao, Yifeng, et al.
Published: (2025) -
MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation
by: Chen, Huangwei, et al.
Published: (2025)