Saved in:
| Main Authors: | Wang, Chenyu, Chen, Tianle, Ahmad, H. M. Sabbir, Batmanghelich, Kayhan, Li, Wenchao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09214 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
by: Wang, Chenyu, et al.
Published: (2026)
by: Wang, Chenyu, et al.
Published: (2026)
Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts
by: Dai, Weicheng, et al.
Published: (2026)
by: Dai, Weicheng, et al.
Published: (2026)
Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
by: Ghosh, Shantanu, et al.
Published: (2024)
by: Ghosh, Shantanu, et al.
Published: (2024)
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
by: Ghosh, Shantanu, et al.
Published: (2024)
by: Ghosh, Shantanu, et al.
Published: (2024)
Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination
by: Zhong, Xinliu, et al.
Published: (2025)
by: Zhong, Xinliu, et al.
Published: (2025)
A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning
by: Chen, Tianle, et al.
Published: (2026)
by: Chen, Tianle, et al.
Published: (2026)
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
by: Huang, Xiaoshuang, et al.
Published: (2024)
by: Huang, Xiaoshuang, et al.
Published: (2024)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
Beyond Cross-Modal Alignment: Measuring and Leveraging Modality Gap in Vision-Language Models
by: Yan, Hanqi, et al.
Published: (2025)
by: Yan, Hanqi, et al.
Published: (2025)
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)
by: Huang, Qidong, et al.
Published: (2024)
Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2024)
by: Wang, Liuyi, et al.
Published: (2024)
Cross-Modal Attention Guided Unlearning in Vision-Language Models
by: Bhaila, Karuna, et al.
Published: (2025)
by: Bhaila, Karuna, et al.
Published: (2025)
Majorization-Guided Test-Time Adaptation for Vision-Language Models under Modality-Specific Shift
by: Chen, Lixian, et al.
Published: (2026)
by: Chen, Lixian, et al.
Published: (2026)
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
by: Zhu, Tinghui, et al.
Published: (2024)
by: Zhu, Tinghui, et al.
Published: (2024)
Quantifying Cross-Modality Memorization in Vision-Language Models
by: Wen, Yuxin, et al.
Published: (2025)
by: Wen, Yuxin, et al.
Published: (2025)
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
BioGait-VLM: A Tri-Modal Vision-Language-Biomechanics Framework for Interpretable Clinical Gait Assessment
by: Chen, Erdong, et al.
Published: (2026)
by: Chen, Erdong, et al.
Published: (2026)
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
by: Feng, Qianhan, et al.
Published: (2024)
by: Feng, Qianhan, et al.
Published: (2024)
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
by: Liang, Yongyuan, et al.
Published: (2025)
by: Liang, Yongyuan, et al.
Published: (2025)
BioVLM: Routing Prompts, Not Parameters, for Cross-Modality Generalization in Biomedical VLMs
by: Singha, Mainak, et al.
Published: (2026)
by: Singha, Mainak, et al.
Published: (2026)
Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval
by: Jang, Young Kyun, et al.
Published: (2024)
by: Jang, Young Kyun, et al.
Published: (2024)
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
by: Xu, Peng, et al.
Published: (2023)
by: Xu, Peng, et al.
Published: (2023)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
by: Danish, Muhammad Sohail, et al.
Published: (2024)
by: Danish, Muhammad Sohail, et al.
Published: (2024)
Distilling Cross-Modal Knowledge via Feature Disentanglement
by: Liu, Junhong, et al.
Published: (2025)
by: Liu, Junhong, et al.
Published: (2025)
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
by: Tang, Zhenwei, et al.
Published: (2025)
by: Tang, Zhenwei, et al.
Published: (2025)
Cross-Modal Redundancy and the Geometry of Vision-Language Embeddings
by: Dhimoïla, Grégoire, et al.
Published: (2026)
by: Dhimoïla, Grégoire, et al.
Published: (2026)
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
by: Liu, Zheng, et al.
Published: (2025)
by: Liu, Zheng, et al.
Published: (2025)
GLIMPSE: Holistic Cross-Modal Explainability for Large Vision-Language Models
by: Shen, Guanxi
Published: (2025)
by: Shen, Guanxi
Published: (2025)
Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation
by: Qi, Zheng, et al.
Published: (2025)
by: Qi, Zheng, et al.
Published: (2025)
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
by: Sung-Bin, Kim, et al.
Published: (2024)
by: Sung-Bin, Kim, et al.
Published: (2024)
Cross-Modal Adapter for Vision-Language Retrieval
by: Jiang, Haojun, et al.
Published: (2022)
by: Jiang, Haojun, et al.
Published: (2022)
Firebolt-VL: Efficient Vision-Language Understanding with Cross-Modality Modulation
by: Trinh, Quoc-Huy, et al.
Published: (2026)
by: Trinh, Quoc-Huy, et al.
Published: (2026)
ConsensusDrop: Fusing Visual and Cross-Modal Saliency for Efficient Vision Language Models
by: Parikh, Dhruv, et al.
Published: (2026)
by: Parikh, Dhruv, et al.
Published: (2026)
Cross-Modal Adapter: Parameter-Efficient Transfer Learning Approach for Vision-Language Models
by: Yang, Juncheng, et al.
Published: (2024)
by: Yang, Juncheng, et al.
Published: (2024)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
Balanced Multi-modal Federated Learning via Cross-Modal Infiltration
by: Fan, Yunfeng, et al.
Published: (2023)
by: Fan, Yunfeng, et al.
Published: (2023)
Q-CLIP: Unleashing the Power of Vision-Language Models for Video Quality Assessment through Unified Cross-Modal Adaptation
by: Mi, Yachun, et al.
Published: (2025)
by: Mi, Yachun, et al.
Published: (2025)
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
by: Lin, Dongyan, et al.
Published: (2026)
by: Lin, Dongyan, et al.
Published: (2026)
Similar Items
-
Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
by: Wang, Chenyu, et al.
Published: (2026) -
Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts
by: Dai, Weicheng, et al.
Published: (2026) -
Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
by: Ghosh, Shantanu, et al.
Published: (2024) -
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
by: Ghosh, Shantanu, et al.
Published: (2024) -
Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination
by: Zhong, Xinliu, et al.
Published: (2025)