Saved in:
| Main Authors: | Zhang, Ziyang, Yu, Yang, Yang, Xulei, Yeo, Si Yong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.12108 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025)
by: Zhang, Ziyang, et al.
Published: (2025)
Future-Aware Interaction Network For Motion Forecasting
by: Li, Shijie, et al.
Published: (2025)
by: Li, Shijie, et al.
Published: (2025)
RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation
by: Chen, Yucheng, et al.
Published: (2026)
by: Chen, Yucheng, et al.
Published: (2026)
MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images
by: Wang, Qirui, et al.
Published: (2025)
by: Wang, Qirui, et al.
Published: (2025)
Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training
by: Lei, Xingliang, et al.
Published: (2024)
by: Lei, Xingliang, et al.
Published: (2024)
Superpixel Semantics Representation and Pre-training for Vision-Language Task
by: Zhang, Siyu, et al.
Published: (2023)
by: Zhang, Siyu, et al.
Published: (2023)
ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
by: Guo, Yangyang, et al.
Published: (2023)
by: Guo, Yangyang, et al.
Published: (2023)
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
by: Yang, Fan, et al.
Published: (2024)
by: Yang, Fan, et al.
Published: (2024)
SCALE-VLP: Soft-Weighted Contrastive Volumetric Vision-Language Pre-training with Spatial-Knowledge Semantics
by: Mahdizadeh, Ailar, et al.
Published: (2025)
by: Mahdizadeh, Ailar, et al.
Published: (2025)
MedFILIP: Medical Fine-grained Language-Image Pre-training
by: Liang, Xinjie, et al.
Published: (2025)
by: Liang, Xinjie, et al.
Published: (2025)
Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset
by: Deng, Ziye, et al.
Published: (2025)
by: Deng, Ziye, et al.
Published: (2025)
Efficient Vision-Language Pre-training by Cluster Masking
by: Wei, Zihao, et al.
Published: (2024)
by: Wei, Zihao, et al.
Published: (2024)
MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning
by: Liu, Jiarun, et al.
Published: (2024)
by: Liu, Jiarun, et al.
Published: (2024)
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
by: Zhou, Hantao, et al.
Published: (2024)
by: Zhou, Hantao, et al.
Published: (2024)
An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions
by: Wang, Zhe, et al.
Published: (2025)
by: Wang, Zhe, et al.
Published: (2025)
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
by: Tao, Yiyi, et al.
Published: (2024)
by: Tao, Yiyi, et al.
Published: (2024)
Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model
by: Tian, Yuxin, et al.
Published: (2024)
by: Tian, Yuxin, et al.
Published: (2024)
Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model
by: Shi, Jiang-Xin, et al.
Published: (2024)
by: Shi, Jiang-Xin, et al.
Published: (2024)
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
by: Ye, Wei, et al.
Published: (2024)
by: Ye, Wei, et al.
Published: (2024)
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
by: Dong, Wei, et al.
Published: (2024)
by: Dong, Wei, et al.
Published: (2024)
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
by: Jiang, Chaoya, et al.
Published: (2023)
by: Jiang, Chaoya, et al.
Published: (2023)
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
by: Hua, Hang, et al.
Published: (2024)
by: Hua, Hang, et al.
Published: (2024)
Unsupervised Network for Single Image Raindrop Removal
by: Wang, Huijiao, et al.
Published: (2024)
by: Wang, Huijiao, et al.
Published: (2024)
MedCutMix: A Data-Centric Approach to Improve Radiology Vision-Language Pre-training with Disease Awareness
by: Wang, Sinuo, et al.
Published: (2025)
by: Wang, Sinuo, et al.
Published: (2025)
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
by: Luo, Lingxiao, et al.
Published: (2024)
by: Luo, Lingxiao, et al.
Published: (2024)
Learning to Rank Pre-trained Vision-Language Models for Downstream Tasks
by: Ding, Yuhe, et al.
Published: (2024)
by: Ding, Yuhe, et al.
Published: (2024)
Centroid-centered Modeling for Efficient Vision Transformer Pre-training
by: Yan, Xin, et al.
Published: (2023)
by: Yan, Xin, et al.
Published: (2023)
FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training
by: Huang, Jiale, et al.
Published: (2024)
by: Huang, Jiale, et al.
Published: (2024)
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
by: Wu, Shihan, et al.
Published: (2024)
by: Wu, Shihan, et al.
Published: (2024)
Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training
by: Zeng, Shuang, et al.
Published: (2023)
by: Zeng, Shuang, et al.
Published: (2023)
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
by: Liu, Haowei, et al.
Published: (2024)
by: Liu, Haowei, et al.
Published: (2024)
Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training
by: Li, Wenyu, et al.
Published: (2025)
by: Li, Wenyu, et al.
Published: (2025)
Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation
by: Shen, Jiachen, et al.
Published: (2023)
by: Shen, Jiachen, et al.
Published: (2023)
Continual Retinal Vision-Language Pre-training upon Incremental Imaging Modalities
by: Yao, Yuang, et al.
Published: (2025)
by: Yao, Yuang, et al.
Published: (2025)
AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular Data
by: Yu, Congjing, et al.
Published: (2025)
by: Yu, Congjing, et al.
Published: (2025)
SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images
by: Wang, Haoyu, et al.
Published: (2023)
by: Wang, Haoyu, et al.
Published: (2023)
Zero-Shot 3D Visual Grounding from Vision-Language Models
by: Li, Rong, et al.
Published: (2025)
by: Li, Rong, et al.
Published: (2025)
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
by: Shui, Zhongyi, et al.
Published: (2025)
by: Shui, Zhongyi, et al.
Published: (2025)
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
by: Yin, Ziyi, et al.
Published: (2023)
by: Yin, Ziyi, et al.
Published: (2023)
Similar Items
-
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
by: Zhang, Ziyang, et al.
Published: (2025) -
Future-Aware Interaction Network For Motion Forecasting
by: Li, Shijie, et al.
Published: (2025) -
RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation
by: Chen, Yucheng, et al.
Published: (2026) -
MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images
by: Wang, Qirui, et al.
Published: (2025) -
Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training
by: Lei, Xingliang, et al.
Published: (2024)