:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pandey, Ananya, Vishwakarma, Dinesh Kumar
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.02571
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)

Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection
by: Aggarwal, Sajal, et al.
Published: (2024)

VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering Visual, Acoustic and Glossary Features
by: Pandey, Ananya, et al.
Published: (2024)

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
by: Girella, Federico, et al.
Published: (2025)

A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations
by: Ma, Bin, et al.
Published: (2025)

Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
by: Kim, Jaeill, et al.
Published: (2024)

Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)

MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing
by: Gupta, Debashis, et al.
Published: (2025)

WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
by: Wu, Jingjing, et al.
Published: (2024)

Towards Effective Image Forensics via A Novel Computationally Efficient Framework and A New Image Splice Dataset
by: Yadav, Ankit, et al.
Published: (2024)

A Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler
by: Yadav, Ankit, et al.
Published: (2024)

MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data
by: Sheludzko, Siarhei, et al.
Published: (2026)

Spatial Transcriptomics Expression Prediction from Histopathology Based on Cross-Modal Mask Reconstruction and Contrastive Learning
by: Liu, Junzhuo, et al.
Published: (2025)

COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs
by: Zu, Xinrui, et al.
Published: (2024)

Kvasir-VQA: A Text-Image Pair GI Tract Dataset
by: Gautam, Sushant, et al.
Published: (2024)

Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs
by: Xian, Jia Jun Cheng, et al.
Published: (2025)

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024)

Exploiting Data Hierarchy as a New Modality for Contrastive Learning
by: Bhalla, Arjun, et al.
Published: (2024)

Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation
by: Shu, Yongbo, et al.
Published: (2026)

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
by: Han, Haochen, et al.
Published: (2024)

Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning
by: Qian, Jiahe, et al.
Published: (2025)

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
by: Jiang, Chen, et al.
Published: (2023)

A Noise and Edge extraction-based dual-branch method for Shallowfake and Deepfake Localization
by: Dagar, Deepak, et al.
Published: (2024)

Multi-Modal Character Localization and Extraction for Chinese Text Recognition
by: Li, Qilong, et al.
Published: (2026)

Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval
by: Peng, Likang, et al.
Published: (2025)

Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke
by: Chen, Liren, et al.
Published: (2026)

Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation
by: Agarwal, Lakshita, et al.
Published: (2025)

Multi-language Video Subtitle Dataset for Image-based Text Recognition
by: Singkhornart, Thanadol, et al.
Published: (2024)

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)

ClipTBP: Clip-Pair based Temporal Boundary Prediction with Boundary-Aware Learning for Moment Retrieval
by: Kim, Ji-Hyeon, et al.
Published: (2026)

Self-Contrastive Weakly Supervised Learning Framework for Prognostic Prediction Using Whole Slide Images
by: Fuster, Saul, et al.
Published: (2024)

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training
by: Zeng, Shuang, et al.
Published: (2023)

Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis
by: Lim, Minjoo, et al.
Published: (2025)

Face Detection: Present State and Research Directions
by: Prabhat, Purnendu, et al.
Published: (2024)

3D Architect: An Automated Approach to Three-Dimensional Modeling
by: Tiwari, Sunil, et al.
Published: (2026)

A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness
by: Chen, Boqi, et al.
Published: (2024)

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
by: Wang, Ying, et al.
Published: (2023)

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
by: Berman, Nimrod, et al.
Published: (2025)

Vision Learners Meet Web Image-Text Pairs
by: Zhao, Bingchen, et al.
Published: (2023)

Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
by: Yaras, Can, et al.
Published: (2024)