:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Chenyu, Chen, Tianle, Ahmad, H. M. Sabbir, Batmanghelich, Kayhan, Li, Wenchao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.09214
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
by: Wang, Chenyu, et al.
Published: (2026)

Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts
by: Dai, Weicheng, et al.
Published: (2026)

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
by: Ghosh, Shantanu, et al.
Published: (2024)

LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
by: Ghosh, Shantanu, et al.
Published: (2024)

Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination
by: Zhong, Xinliu, et al.
Published: (2025)

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning
by: Chen, Tianle, et al.
Published: (2026)

Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
by: Huang, Xiaoshuang, et al.
Published: (2024)

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)

Beyond Cross-Modal Alignment: Measuring and Leveraging Modality Gap in Vision-Language Models
by: Yan, Hanqi, et al.
Published: (2025)

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)

Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2024)

Cross-Modal Attention Guided Unlearning in Vision-Language Models
by: Bhaila, Karuna, et al.
Published: (2025)

Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift
by: Chen, Lixian, et al.
Published: (2026)

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
by: Zhu, Tinghui, et al.
Published: (2024)

Quantifying Cross-Modality Memorization in Vision-Language Models
by: Wen, Yuxin, et al.
Published: (2025)

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)

BioGait-VLM: A Tri-Modal Vision-Language-Biomechanics Framework for Interpretable Clinical Gait Assessment
by: Chen, Erdong, et al.
Published: (2026)

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
by: Feng, Qianhan, et al.
Published: (2024)

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
by: Liang, Yongyuan, et al.
Published: (2025)

BioVLM: Routing Prompts, Not Parameters, for Cross-Modality Generalization in Biomedical VLMs
by: Singha, Mainak, et al.
Published: (2026)

Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval
by: Jang, Young Kyun, et al.
Published: (2024)

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
by: Xu, Peng, et al.
Published: (2023)

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
by: Danish, Muhammad Sohail, et al.
Published: (2024)

Distilling Cross-Modal Knowledge via Feature Disentanglement
by: Liu, Junhong, et al.
Published: (2025)

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)

SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
by: Tang, Zhenwei, et al.
Published: (2025)

Cross-Modal Redundancy and the Geometry of Vision-Language Embeddings
by: Dhimoïla, Grégoire, et al.
Published: (2026)

FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
by: Liu, Zheng, et al.
Published: (2025)

GLIMPSE: Holistic Cross-Modal Explainability for Large Vision-Language Models
by: Shen, Guanxi
Published: (2025)

Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation
by: Qi, Zheng, et al.
Published: (2025)

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
by: Sung-Bin, Kim, et al.
Published: (2024)

Cross-Modal Adapter for Vision-Language Retrieval
by: Jiang, Haojun, et al.
Published: (2022)

Firebolt-VL: Efficient Vision-Language Understanding with Cross-Modality Modulation
by: Trinh, Quoc-Huy, et al.
Published: (2026)

ConsensusDrop: Fusing Visual and Cross-Modal Saliency for Efficient Vision Language Models
by: Parikh, Dhruv, et al.
Published: (2026)

Cross-Modal Adapter: Parameter-Efficient Transfer Learning Approach for Vision-Language Models
by: Yang, Juncheng, et al.
Published: (2024)

Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)

Balanced Multi-modal Federated Learning via Cross-Modal Infiltration
by: Fan, Yunfeng, et al.
Published: (2023)

Q-CLIP: Unleashing the Power of Vision-Language Models for Video Quality Assessment through Unified Cross-Modal Adaptation
by: Mi, Yachun, et al.
Published: (2025)

CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
by: Ma, Zhiyuan, et al.
Published: (2024)

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
by: Lin, Dongyan, et al.
Published: (2026)