:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Jiaxi, Hu, Wenhui, Liu, Xueyang, Wu, Beihu, Qiu, Yuting, Cai, YingYing
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2312.17648
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization
by: Kritikos, Antonios, et al.
Published: (2026)

Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
by: Wang, Hu, et al.
Published: (2023)

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency
by: Wei, Riling, et al.
Published: (2025)

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
by: Cheng, Yi, et al.
Published: (2024)

Bridging the Gap in Missing Modalities: Leveraging Knowledge Distillation and Style Matching for Brain Tumor Segmentation
by: Zhu, Shenghao, et al.
Published: (2025)

Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification
by: Liang, Tengfei, et al.
Published: (2023)

Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
by: Wu, Haitao, et al.
Published: (2025)

SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention
by: Xiao, Feng, et al.
Published: (2024)

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)

DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding
by: Tao, Mingzhe, et al.
Published: (2026)

KG-ViP: Bridging Knowledge Grounding and Visual Perception in Multi-modal LLMs for Visual Question Answering
by: Li, Zhiyang, et al.
Published: (2026)

Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation
by: Zheng, Xu, et al.
Published: (2024)

Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation
by: Chen, Yilong, et al.
Published: (2024)

Manipulating Multimodal Agents via Cross-Modal Prompt Injection
by: Wang, Le, et al.
Published: (2025)

Adaptive Perception for Unified Visual Multi-modal Object Tracking
by: Hu, Xiantao, et al.
Published: (2025)

Mind the Modality Gap: Towards a Remote Sensing Vision-Language Model via Cross-modal Alignment
by: Zavras, Angelos, et al.
Published: (2024)

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
by: Jiang, Xilin, et al.
Published: (2025)

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
by: Mistretta, Marco, et al.
Published: (2025)

Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation
by: Wu, Yao, et al.
Published: (2024)

Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
by: Li, Xilai, et al.
Published: (2023)

MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification
by: Zhao, Yujian, et al.
Published: (2025)

Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation
by: Lu, Andong, et al.
Published: (2024)

Multi-level Cross-modal Alignment for Image Clustering
by: Qiu, Liping, et al.
Published: (2024)

Multi-Modality Distillation via Learning the teacher's modality-level Gram Matrix
by: Liu, Peng
Published: (2021)

CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation
by: Zhang, Zelin, et al.
Published: (2026)

Bridging the Semantic-Action Gap in Visual Token Pruning for Efficient VLA Inference
by: Liu, Ziyan, et al.
Published: (2025)

Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding
by: Xie, Minghong, et al.
Published: (2024)

Visual Grounding with Multi-modal Conditional Adaptation
by: Yao, Ruilin, et al.
Published: (2024)

Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models
by: Peng, Ying, et al.
Published: (2025)

SIGMA: Bridging Structural and Distributional Gaps for Vision Foundation Model Adaptation
by: Xiong, Lingyu, et al.
Published: (2026)

Lightweight Contrastive Distilled Hashing for Online Cross-modal Retrieval
by: Li, Jiaxing, et al.
Published: (2025)

Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling
by: Zhang, Min, et al.
Published: (2024)

Bridging the Inter-Domain Gap through Low-Level Features for Cross-Modal Medical Image Segmentation
by: Lyu, Pengfei, et al.
Published: (2025)

RGBX-R1: Visual Modality Chain-of-Thought Guided Reinforcement Learning for Multimodal Grounding
by: Wu, Jiahe, et al.
Published: (2026)

S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning
by: Lin, Weihao, et al.
Published: (2024)

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
by: Li, Leheng, et al.
Published: (2024)

Bridging Cognitive Gap: Hierarchical Description Learning for Artistic Image Aesthetics Assessment
by: Liu, Henglin, et al.
Published: (2025)

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
by: He, Jing, et al.
Published: (2024)

Multi-modal Generation via Cross-Modal In-Context Learning
by: Kumar, Amandeep, et al.
Published: (2024)

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
by: Govindarajan, Hariprasath, et al.
Published: (2025)