Saved in:
| Main Authors: | Huang, Xinrui, Xiao, Fan, He, Dongming, Gao, Anqi, Li, Dandan, Zhang, Xiaofan, Zhang, Shaoting, Wang, Xudong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.14532 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PathoTune: Adapting Visual Foundation Model to Pathological Specialists
by: Lu, Jiaxuan, et al.
Published: (2024)
by: Lu, Jiaxuan, et al.
Published: (2024)
MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications
by: Yu, Yongrui, et al.
Published: (2024)
by: Yu, Yongrui, et al.
Published: (2024)
A Synthetic Data-Driven Radiology Foundation Model for Pan-tumor Clinical Diagnosis
by: Lei, Wenhui, et al.
Published: (2025)
by: Lei, Wenhui, et al.
Published: (2025)
Unifying Multiple Foundation Models for Advanced Computational Pathology
by: Lei, Wenhui, et al.
Published: (2025)
by: Lei, Wenhui, et al.
Published: (2025)
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
by: Jiang, Yankai, et al.
Published: (2024)
by: Jiang, Yankai, et al.
Published: (2024)
Vision Foundation Models as Generalist Tokenizers for Image Generation
by: Zheng, Anlin, et al.
Published: (2026)
by: Zheng, Anlin, et al.
Published: (2026)
Modality-Aware and Shift Mixer for Multi-modal Brain Tumor Segmentation
by: Huang, Zhongzhen, et al.
Published: (2024)
by: Huang, Zhongzhen, et al.
Published: (2024)
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
by: Jiang, Yankai, et al.
Published: (2023)
by: Jiang, Yankai, et al.
Published: (2023)
CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation
by: Huang, Zhongzhen, et al.
Published: (2024)
by: Huang, Zhongzhen, et al.
Published: (2024)
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
by: Qiu, Jianing, et al.
Published: (2023)
by: Qiu, Jianing, et al.
Published: (2023)
MedLSAM: Localize and Segment Anything Model for 3D CT Images
by: Lei, Wenhui, et al.
Published: (2023)
by: Lei, Wenhui, et al.
Published: (2023)
Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
by: Zhang, Kuan, et al.
Published: (2026)
by: Zhang, Kuan, et al.
Published: (2026)
DRScaffold: Boosting Dense-Scene Reasoning in Lightweight Vision Language Models
by: Shi, Xinrui, et al.
Published: (2026)
by: Shi, Xinrui, et al.
Published: (2026)
DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining Channels
by: Wei, Linda, et al.
Published: (2024)
by: Wei, Linda, et al.
Published: (2024)
Masked AutoDecoder is Effective Multi-Task Vision Generalist
by: Qiu, Han, et al.
Published: (2024)
by: Qiu, Han, et al.
Published: (2024)
CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers
by: Gu, Yannian, et al.
Published: (2026)
by: Gu, Yannian, et al.
Published: (2026)
Towards Unbiased Source-Free Object Detection via Vision Foundation Models
by: Cai, Zhi, et al.
Published: (2026)
by: Cai, Zhi, et al.
Published: (2026)
Toward a Diffusion-Based Generalist for Dense Vision Tasks
by: Fan, Yue, et al.
Published: (2024)
by: Fan, Yue, et al.
Published: (2024)
EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging
by: Shi, Danli, et al.
Published: (2024)
by: Shi, Danli, et al.
Published: (2024)
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
by: Mu, Linjie, et al.
Published: (2025)
by: Mu, Linjie, et al.
Published: (2025)
OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation
by: He, Xingxin, et al.
Published: (2025)
by: He, Xingxin, et al.
Published: (2025)
One for All: Toward Unified Foundation Models for Earth Vision
by: Xiong, Zhitong, et al.
Published: (2024)
by: Xiong, Zhitong, et al.
Published: (2024)
Interactive Segmentation and Report Generation for CT Images
by: Gu, Yannian, et al.
Published: (2025)
by: Gu, Yannian, et al.
Published: (2025)
OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning
by: Yang, Zhengwei, et al.
Published: (2026)
by: Yang, Zhengwei, et al.
Published: (2026)
Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
by: Wang, Zhao, et al.
Published: (2023)
by: Wang, Zhao, et al.
Published: (2023)
OctoNav: Towards Generalist Embodied Navigation
by: Gao, Chen, et al.
Published: (2025)
by: Gao, Chen, et al.
Published: (2025)
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
by: Zhang, Jinjin, et al.
Published: (2025)
by: Zhang, Jinjin, et al.
Published: (2025)
Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models
by: Chen, Zizhi, et al.
Published: (2025)
by: Chen, Zizhi, et al.
Published: (2025)
MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis
by: Zhu, Ning, et al.
Published: (2025)
by: Zhu, Ning, et al.
Published: (2025)
Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation
by: Zhang, Anqi, et al.
Published: (2024)
by: Zhang, Anqi, et al.
Published: (2024)
Unifying Biomedical Vision-Language Expertise: Towards a Generalist Foundation Model via Multi-CLIP Knowledge Distillation
by: Wang, Shansong, et al.
Published: (2025)
by: Wang, Shansong, et al.
Published: (2025)
MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry
by: Li, Meng-Xun, et al.
Published: (2026)
by: Li, Meng-Xun, et al.
Published: (2026)
TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction
by: Zhang, Fengyi, et al.
Published: (2025)
by: Zhang, Fengyi, et al.
Published: (2025)
BRIGHT: A Collaborative Generalist-Specialist Foundation Model for Breast Pathology
by: Guo, Xiaojing, et al.
Published: (2026)
by: Guo, Xiaojing, et al.
Published: (2026)
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
by: Ren, Sucheng, et al.
Published: (2024)
by: Ren, Sucheng, et al.
Published: (2024)
What Matters in Building Vision-Language-Action Models for Generalist Robots
by: Li, Xinghang, et al.
Published: (2024)
by: Li, Xinghang, et al.
Published: (2024)
OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models
by: Zhong, Lanfeng, et al.
Published: (2025)
by: Zhong, Lanfeng, et al.
Published: (2025)
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
by: Zhong, Lanfeng, et al.
Published: (2024)
by: Zhong, Lanfeng, et al.
Published: (2024)
Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification
by: Sun, Xiangyu, et al.
Published: (2025)
by: Sun, Xiangyu, et al.
Published: (2025)
Vision Generalist Model: A Survey
by: Wang, Ziyi, et al.
Published: (2025)
by: Wang, Ziyi, et al.
Published: (2025)
Similar Items
-
PathoTune: Adapting Visual Foundation Model to Pathological Specialists
by: Lu, Jiaxuan, et al.
Published: (2024) -
MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications
by: Yu, Yongrui, et al.
Published: (2024) -
A Synthetic Data-Driven Radiology Foundation Model for Pan-tumor Clinical Diagnosis
by: Lei, Wenhui, et al.
Published: (2025) -
Unifying Multiple Foundation Models for Advanced Computational Pathology
by: Lei, Wenhui, et al.
Published: (2025) -
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
by: Jiang, Yankai, et al.
Published: (2024)