Saved in:
| Main Authors: | He, Yuting, You, Chenyu, Li, Shuo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.21861 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Homeomorphism Prior for False Positive and Negative Problem in Medical Image Dense Contrastive Representation Learning
by: He, Yuting, et al.
Published: (2025)
by: He, Yuting, et al.
Published: (2025)
ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
by: Choi, Sangbum, et al.
Published: (2025)
by: Choi, Sangbum, et al.
Published: (2025)
Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025)
by: Walimbe, Soham, et al.
Published: (2025)
Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
by: Jang, Young Kyun, et al.
Published: (2024)
by: Jang, Young Kyun, et al.
Published: (2024)
Complementarity-driven Representation Learning for Multi-modal Knowledge Graph Completion
by: Li, Lijian
Published: (2025)
by: Li, Lijian
Published: (2025)
Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
by: He, Yuting, et al.
Published: (2025)
by: He, Yuting, et al.
Published: (2025)
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
by: Tan, Zhaorui, et al.
Published: (2024)
by: Tan, Zhaorui, et al.
Published: (2024)
Exploring Efficient Foundational Multi-modal Models for Video Summarization
by: Samel, Karan, et al.
Published: (2024)
by: Samel, Karan, et al.
Published: (2024)
Multi-modal Relation Distillation for Unified 3D Representation Learning
by: Wang, Huiqun, et al.
Published: (2024)
by: Wang, Huiqun, et al.
Published: (2024)
Multi-modal Vision Pre-training for Medical Image Analysis
by: Rui, Shaohao, et al.
Published: (2024)
by: Rui, Shaohao, et al.
Published: (2024)
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
by: Si, Haozhe, et al.
Published: (2025)
by: Si, Haozhe, et al.
Published: (2025)
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models
by: Lian, Chenyu, et al.
Published: (2025)
by: Lian, Chenyu, et al.
Published: (2025)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
by: Yuan, Kun, et al.
Published: (2023)
by: Yuan, Kun, et al.
Published: (2023)
Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model
by: Delgrange, Camille, et al.
Published: (2024)
by: Delgrange, Camille, et al.
Published: (2024)
Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation
by: Zhu, Chunzheng, et al.
Published: (2026)
by: Zhu, Chunzheng, et al.
Published: (2026)
Explaining Multi-modal Large Language Models by Analyzing their Vision Perception
by: Giulivi, Loris, et al.
Published: (2024)
by: Giulivi, Loris, et al.
Published: (2024)
Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection
by: Yao, Huizai, et al.
Published: (2025)
by: Yao, Huizai, et al.
Published: (2025)
Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading
by: Pan, Li, et al.
Published: (2024)
by: Pan, Li, et al.
Published: (2024)
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
by: Li, Yanwei, et al.
Published: (2024)
by: Li, Yanwei, et al.
Published: (2024)
REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction
by: Leem, Seowung, et al.
Published: (2026)
by: Leem, Seowung, et al.
Published: (2026)
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
by: Li, Zhang, et al.
Published: (2025)
by: Li, Zhang, et al.
Published: (2025)
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke
by: Chen, Liren, et al.
Published: (2026)
by: Chen, Liren, et al.
Published: (2026)
Medical Large Vision Language Models with Multi-Image Visual Ability
by: Yang, Xikai, et al.
Published: (2025)
by: Yang, Xikai, et al.
Published: (2025)
Multi-modal, Multi-task, Multi-criteria Automatic Evaluation with Vision Language Models
by: Ohi, Masanari, et al.
Published: (2024)
by: Ohi, Masanari, et al.
Published: (2024)
Imaging foundation model for universal enhancement of non-ideal measurement CT
by: Ge, Rongjun, et al.
Published: (2024)
by: Ge, Rongjun, et al.
Published: (2024)
M$^3$-Med: A Benchmark for Multi-lingual, Multi-modal, and Multi-hop Reasoning in Medical Instructional Video Understanding
by: Liu, Shenxi, et al.
Published: (2025)
by: Liu, Shenxi, et al.
Published: (2025)
Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
by: Chen, Junkai, et al.
Published: (2026)
by: Chen, Junkai, et al.
Published: (2026)
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
by: Li, Nanxi, et al.
Published: (2026)
by: Li, Nanxi, et al.
Published: (2026)
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
by: Cheng, Zihui, et al.
Published: (2024)
by: Cheng, Zihui, et al.
Published: (2024)
Skill-Conditioned Visual Geolocation for Vision-Language Models
by: Yang, Chenjie, et al.
Published: (2026)
by: Yang, Chenjie, et al.
Published: (2026)
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
by: Li, Siyuan, et al.
Published: (2023)
by: Li, Siyuan, et al.
Published: (2023)
Hypersolid: Emergent Vision Representations via Short-Range Repulsion
by: Rodríguez-Betancourt, Esteban, et al.
Published: (2026)
by: Rodríguez-Betancourt, Esteban, et al.
Published: (2026)
A World Model of Radiologist Reading for Medical Image Representation Learning
by: Li, Yiwei, et al.
Published: (2026)
by: Li, Yiwei, et al.
Published: (2026)
Multimodal Medical Image Classification via Synergistic Learning Pre-training
by: Lin, Qinghua, et al.
Published: (2025)
by: Lin, Qinghua, et al.
Published: (2025)
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
by: Pan, Chenbin, et al.
Published: (2025)
by: Pan, Chenbin, et al.
Published: (2025)
Multi-modality Anomaly Segmentation on the Road
by: Gao, Heng, et al.
Published: (2025)
by: Gao, Heng, et al.
Published: (2025)
Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition
by: Baby, Britty, et al.
Published: (2025)
by: Baby, Britty, et al.
Published: (2025)
Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation
by: Zhou, Zhongliang, et al.
Published: (2024)
by: Zhou, Zhongliang, et al.
Published: (2024)
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
by: Li, Zhang, et al.
Published: (2023)
by: Li, Zhang, et al.
Published: (2023)
DCG ReID: Disentangling Collaboration and Guidance Fusion Representations for Multi-modal Vehicle Re-Identification
by: Zheng, Aihua, et al.
Published: (2026)
by: Zheng, Aihua, et al.
Published: (2026)
Similar Items
-
Homeomorphism Prior for False Positive and Negative Problem in Medical Image Dense Contrastive Representation Learning
by: He, Yuting, et al.
Published: (2025) -
ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
by: Choi, Sangbum, et al.
Published: (2025) -
Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision
by: Walimbe, Soham, et al.
Published: (2025) -
Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
by: Jang, Young Kyun, et al.
Published: (2024) -
Complementarity-driven Representation Learning for Multi-modal Knowledge Graph Completion
by: Li, Lijian
Published: (2025)