Saved in:
| Main Authors: | Ye, Yaoqin, Zhang, Junjie, Shi, Hongwei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.06468 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Anatomical Structure-Guided Medical Vision-Language Pre-training
by: Li, Qingqiu, et al.
Published: (2024)
by: Li, Qingqiu, et al.
Published: (2024)
MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models
by: Miao, Yongzhu, et al.
Published: (2023)
by: Miao, Yongzhu, et al.
Published: (2023)
VLP: A Survey on Vision-Language Pre-training
by: Chen, Feilong, et al.
Published: (2022)
by: Chen, Feilong, et al.
Published: (2022)
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
by: Jiang, Chaoya, et al.
Published: (2023)
by: Jiang, Chaoya, et al.
Published: (2023)
Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images
by: Mahanta, Cristina, et al.
Published: (2025)
by: Mahanta, Cristina, et al.
Published: (2025)
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
by: Xia, Renqiu, et al.
Published: (2024)
by: Xia, Renqiu, et al.
Published: (2024)
Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations
by: Zhu, Kangyu, et al.
Published: (2025)
by: Zhu, Kangyu, et al.
Published: (2025)
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
by: Shuai, Zitao, et al.
Published: (2024)
by: Shuai, Zitao, et al.
Published: (2024)
Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training
by: Bawazir, Ameera, et al.
Published: (2024)
by: Bawazir, Ameera, et al.
Published: (2024)
Superpixel Semantics Representation and Pre-training for Vision-Language Task
by: Zhang, Siyu, et al.
Published: (2023)
by: Zhang, Siyu, et al.
Published: (2023)
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
by: Ma, Shuailei, et al.
Published: (2023)
by: Ma, Shuailei, et al.
Published: (2023)
From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models
by: Wang, Qidong, et al.
Published: (2026)
by: Wang, Qidong, et al.
Published: (2026)
An Explainable Biomedical Foundation Model via Large-Scale Concept-Enhanced Vision-Language Pre-training
by: Nie, Yuxiang, et al.
Published: (2025)
by: Nie, Yuxiang, et al.
Published: (2025)
Foundation Model-guided Iteratively Prompting and Pseudo-Labeling for Partially Labeled Medical Image Segmentation
by: Zhao, Qiaochu, et al.
Published: (2026)
by: Zhao, Qiaochu, et al.
Published: (2026)
TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models
by: Zhang, Junyi, et al.
Published: (2025)
by: Zhang, Junyi, et al.
Published: (2025)
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
by: Zhang, Yuhui, et al.
Published: (2023)
by: Zhang, Yuhui, et al.
Published: (2023)
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
by: Zhang, Zhaoyang, et al.
Published: (2023)
by: Zhang, Zhaoyang, et al.
Published: (2023)
Text Prompt Injection of Vision Language Models
by: Zhu, Ruizhe
Published: (2025)
by: Zhu, Ruizhe
Published: (2025)
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
by: Patrício, Cristiano, et al.
Published: (2025)
by: Patrício, Cristiano, et al.
Published: (2025)
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
by: Zhong, Lanfeng, et al.
Published: (2024)
by: Zhong, Lanfeng, et al.
Published: (2024)
Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models
by: Wei, Canshi
Published: (2024)
by: Wei, Canshi
Published: (2024)
Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification
by: Weng, Charles, et al.
Published: (2026)
by: Weng, Charles, et al.
Published: (2026)
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
by: Liu, Haowei, et al.
Published: (2024)
by: Liu, Haowei, et al.
Published: (2024)
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
by: Du, Yiyang, et al.
Published: (2026)
by: Du, Yiyang, et al.
Published: (2026)
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
by: Jia, Mengzhao, et al.
Published: (2024)
by: Jia, Mengzhao, et al.
Published: (2024)
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey
by: Qi, Yayun, et al.
Published: (2024)
by: Qi, Yayun, et al.
Published: (2024)
Pre-trained Vision-Language Models Learn Discoverable Visual Concepts
by: Zang, Yuan, et al.
Published: (2024)
by: Zang, Yuan, et al.
Published: (2024)
T3D: Advancing 3D Medical Vision-Language Pre-training by Learning Multi-View Visual Consistency
by: Liu, Che, et al.
Published: (2023)
by: Liu, Che, et al.
Published: (2023)
Enhancing Vision-Language Model Pre-training with Image-text Pair Pruning Based on Word Frequency
by: Liang, Mingliang, et al.
Published: (2024)
by: Liang, Mingliang, et al.
Published: (2024)
Retrieval-augmented Prompt Learning for Pre-trained Foundation Models
by: Chen, Xiang, et al.
Published: (2025)
by: Chen, Xiang, et al.
Published: (2025)
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
by: Kumar, Gurucharan Marthi Krishna, et al.
Published: (2024)
by: Kumar, Gurucharan Marthi Krishna, et al.
Published: (2024)
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
by: Jin, Haibo, et al.
Published: (2023)
by: Jin, Haibo, et al.
Published: (2023)
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
by: Oh, Youngtaek, et al.
Published: (2024)
by: Oh, Youngtaek, et al.
Published: (2024)
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
by: Chen, Yangyi, et al.
Published: (2025)
by: Chen, Yangyi, et al.
Published: (2025)
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models
by: Jiang, Yifan, et al.
Published: (2026)
by: Jiang, Yifan, et al.
Published: (2026)
Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction
by: Filvantorkaman, Melika, et al.
Published: (2026)
by: Filvantorkaman, Melika, et al.
Published: (2026)
Medical Context Distorts Decisions in Clinical Vision Language Models
by: Restrepo, David, et al.
Published: (2026)
by: Restrepo, David, et al.
Published: (2026)
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
by: Adhikari, Rabin, et al.
Published: (2024)
by: Adhikari, Rabin, et al.
Published: (2024)
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
by: Koleilat, Taha, et al.
Published: (2024)
by: Koleilat, Taha, et al.
Published: (2024)
Similar Items
-
Anatomical Structure-Guided Medical Vision-Language Pre-training
by: Li, Qingqiu, et al.
Published: (2024) -
MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models
by: Miao, Yongzhu, et al.
Published: (2023) -
VLP: A Survey on Vision-Language Pre-training
by: Chen, Feilong, et al.
Published: (2022) -
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
by: Jiang, Chaoya, et al.
Published: (2023) -
Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images
by: Mahanta, Cristina, et al.
Published: (2025)