Saved in:
| Main Authors: | Pandey, Ananya, Vishwakarma, Dinesh Kumar |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.02571 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection
by: Aggarwal, Sajal, et al.
Published: (2024)
by: Aggarwal, Sajal, et al.
Published: (2024)
VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering Visual, Acoustic and Glossary Features
by: Pandey, Ananya, et al.
Published: (2024)
by: Pandey, Ananya, et al.
Published: (2024)
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
by: Girella, Federico, et al.
Published: (2025)
by: Girella, Federico, et al.
Published: (2025)
A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations
by: Ma, Bin, et al.
Published: (2025)
by: Ma, Bin, et al.
Published: (2025)
Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
by: Kim, Jaeill, et al.
Published: (2024)
by: Kim, Jaeill, et al.
Published: (2024)
Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing
by: Gupta, Debashis, et al.
Published: (2025)
by: Gupta, Debashis, et al.
Published: (2025)
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
by: Wu, Jingjing, et al.
Published: (2024)
by: Wu, Jingjing, et al.
Published: (2024)
Towards Effective Image Forensics via A Novel Computationally Efficient Framework and A New Image Splice Dataset
by: Yadav, Ankit, et al.
Published: (2024)
by: Yadav, Ankit, et al.
Published: (2024)
A Visually Attentive Splice Localization Network with Multi-Domain Feature Extractor and Multi-Receptive Field Upsampler
by: Yadav, Ankit, et al.
Published: (2024)
by: Yadav, Ankit, et al.
Published: (2024)
MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data
by: Sheludzko, Siarhei, et al.
Published: (2026)
by: Sheludzko, Siarhei, et al.
Published: (2026)
Spatial Transcriptomics Expression Prediction from Histopathology Based on Cross-Modal Mask Reconstruction and Contrastive Learning
by: Liu, Junzhuo, et al.
Published: (2025)
by: Liu, Junzhuo, et al.
Published: (2025)
COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs
by: Zu, Xinrui, et al.
Published: (2024)
by: Zu, Xinrui, et al.
Published: (2024)
Kvasir-VQA: A Text-Image Pair GI Tract Dataset
by: Gautam, Sushant, et al.
Published: (2024)
by: Gautam, Sushant, et al.
Published: (2024)
Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs
by: Xian, Jia Jun Cheng, et al.
Published: (2025)
by: Xian, Jia Jun Cheng, et al.
Published: (2025)
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024)
by: Zhou, Chunting, et al.
Published: (2024)
Exploiting Data Hierarchy as a New Modality for Contrastive Learning
by: Bhalla, Arjun, et al.
Published: (2024)
by: Bhalla, Arjun, et al.
Published: (2024)
Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation
by: Shu, Yongbo, et al.
Published: (2026)
by: Shu, Yongbo, et al.
Published: (2026)
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
by: Han, Haochen, et al.
Published: (2024)
by: Han, Haochen, et al.
Published: (2024)
Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning
by: Qian, Jiahe, et al.
Published: (2025)
by: Qian, Jiahe, et al.
Published: (2025)
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
by: Jiang, Chen, et al.
Published: (2023)
by: Jiang, Chen, et al.
Published: (2023)
A Noise and Edge extraction-based dual-branch method for Shallowfake and Deepfake Localization
by: Dagar, Deepak, et al.
Published: (2024)
by: Dagar, Deepak, et al.
Published: (2024)
Multi-Modal Character Localization and Extraction for Chinese Text Recognition
by: Li, Qilong, et al.
Published: (2026)
by: Li, Qilong, et al.
Published: (2026)
Semantic-Consistent Bidirectional Contrastive Hashing for Noisy Multi-Label Cross-Modal Retrieval
by: Peng, Likang, et al.
Published: (2025)
by: Peng, Likang, et al.
Published: (2025)
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke
by: Chen, Liren, et al.
Published: (2026)
by: Chen, Liren, et al.
Published: (2026)
Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation
by: Agarwal, Lakshita, et al.
Published: (2025)
by: Agarwal, Lakshita, et al.
Published: (2025)
Multi-language Video Subtitle Dataset for Image-based Text Recognition
by: Singkhornart, Thanadol, et al.
Published: (2024)
by: Singkhornart, Thanadol, et al.
Published: (2024)
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
by: Yamabe, Shojiro, et al.
Published: (2025)
by: Yamabe, Shojiro, et al.
Published: (2025)
ClipTBP: Clip-Pair based Temporal Boundary Prediction with Boundary-Aware Learning for Moment Retrieval
by: Kim, Ji-Hyeon, et al.
Published: (2026)
by: Kim, Ji-Hyeon, et al.
Published: (2026)
Self-Contrastive Weakly Supervised Learning Framework for Prognostic Prediction Using Whole Slide Images
by: Fuster, Saul, et al.
Published: (2024)
by: Fuster, Saul, et al.
Published: (2024)
Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training
by: Zeng, Shuang, et al.
Published: (2023)
by: Zeng, Shuang, et al.
Published: (2023)
Multi-modal Contrastive Learning for Tumor-specific Missing Modality Synthesis
by: Lim, Minjoo, et al.
Published: (2025)
by: Lim, Minjoo, et al.
Published: (2025)
Face Detection: Present State and Research Directions
by: Prabhat, Purnendu, et al.
Published: (2024)
by: Prabhat, Purnendu, et al.
Published: (2024)
3D Architect: An Automated Approach to Three-Dimensional Modeling
by: Tiwari, Sunil, et al.
Published: (2026)
by: Tiwari, Sunil, et al.
Published: (2026)
A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness
by: Chen, Boqi, et al.
Published: (2024)
by: Chen, Boqi, et al.
Published: (2024)
Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
by: Wang, Ying, et al.
Published: (2023)
by: Wang, Ying, et al.
Published: (2023)
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
by: Berman, Nimrod, et al.
Published: (2025)
by: Berman, Nimrod, et al.
Published: (2025)
Vision Learners Meet Web Image-Text Pairs
by: Zhao, Bingchen, et al.
Published: (2023)
by: Zhao, Bingchen, et al.
Published: (2023)
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
by: Yaras, Can, et al.
Published: (2024)
by: Yaras, Can, et al.
Published: (2024)
Similar Items
-
Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs
by: Pandey, Ananya, et al.
Published: (2024) -
Modelling Visual Semantics via Image Captioning to extract Enhanced Multi-Level Cross-Modal Semantic Incongruity Representation with Attention for Multimodal Sarcasm Detection
by: Aggarwal, Sajal, et al.
Published: (2024) -
VyAnG-Net: A Novel Multi-Modal Sarcasm Recognition Model by Uncovering Visual, Acoustic and Glossary Features
by: Pandey, Ananya, et al.
Published: (2024) -
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
by: Girella, Federico, et al.
Published: (2025) -
A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations
by: Ma, Bin, et al.
Published: (2025)