:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Xinrui, Xiao, Fan, He, Dongming, Gao, Anqi, Li, Dandan, Zhang, Xiaofan, Zhang, Shaoting, Wang, Xudong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.14532
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PathoTune: Adapting Visual Foundation Model to Pathological Specialists
by: Lu, Jiaxuan, et al.
Published: (2024)

MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications
by: Yu, Yongrui, et al.
Published: (2024)

A Synthetic Data-Driven Radiology Foundation Model for Pan-tumor Clinical Diagnosis
by: Lei, Wenhui, et al.
Published: (2025)

Unifying Multiple Foundation Models for Advanced Computational Pathology
by: Lei, Wenhui, et al.
Published: (2025)

Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
by: Jiang, Yankai, et al.
Published: (2024)

Vision Foundation Models as Generalist Tokenizers for Image Generation
by: Zheng, Anlin, et al.
Published: (2026)

Modality-Aware and Shift Mixer for Multi-modal Brain Tumor Segmentation
by: Huang, Zhongzhen, et al.
Published: (2024)

ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
by: Jiang, Yankai, et al.
Published: (2023)

CAT: Coordinating Anatomical-Textual Prompts for Multi-Organ and Tumor Segmentation
by: Huang, Zhongzhen, et al.
Published: (2024)

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
by: Qiu, Jianing, et al.
Published: (2023)

MedLSAM: Localize and Segment Anything Model for 3D CT Images
by: Lei, Wenhui, et al.
Published: (2023)

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
by: Zhang, Kuan, et al.
Published: (2026)

DRScaffold: Boosting Dense-Scene Reasoning in Lightweight Vision Language Models
by: Shi, Xinrui, et al.
Published: (2026)

DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining Channels
by: Wei, Linda, et al.
Published: (2024)

Masked AutoDecoder is Effective Multi-Task Vision Generalist
by: Qiu, Han, et al.
Published: (2024)

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers
by: Gu, Yannian, et al.
Published: (2026)

Towards Unbiased Source-Free Object Detection via Vision Foundation Models
by: Cai, Zhi, et al.
Published: (2026)

Toward a Diffusion-Based Generalist for Dense Vision Tasks
by: Fan, Yue, et al.
Published: (2024)

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging
by: Shi, Danli, et al.
Published: (2024)

MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
by: Mu, Linjie, et al.
Published: (2025)

OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation
by: He, Xingxin, et al.
Published: (2025)

One for All: Toward Unified Foundation Models for Earth Vision
by: Xiong, Zhitong, et al.
Published: (2024)

Interactive Segmentation and Report Generation for CT Images
by: Gu, Yannian, et al.
Published: (2025)

OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning
by: Yang, Zhengwei, et al.
Published: (2026)

Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
by: Wang, Zhao, et al.
Published: (2023)

OctoNav: Towards Generalist Embodied Navigation
by: Gao, Chen, et al.
Published: (2025)

Towards Training-free Anomaly Detection with Vision and Language Foundation Models
by: Zhang, Jinjin, et al.
Published: (2025)

Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models
by: Chen, Zizhi, et al.
Published: (2025)

MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis
by: Zhu, Ning, et al.
Published: (2025)

Background Adaptation with Residual Modeling for Exemplar-Free Class-Incremental Semantic Segmentation
by: Zhang, Anqi, et al.
Published: (2024)

Unifying Biomedical Vision-Language Expertise: Towards a Generalist Foundation Model via Multi-CLIP Knowledge Distillation
by: Wang, Shansong, et al.
Published: (2025)

MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry
by: Li, Meng-Xun, et al.
Published: (2026)

TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction
by: Zhang, Fengyi, et al.
Published: (2025)

BRIGHT: A Collaborative Generalist-Specialist Foundation Model for Breast Pathology
by: Guo, Xiaojing, et al.
Published: (2026)

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
by: Ren, Sucheng, et al.
Published: (2024)

What Matters in Building Vision-Language-Action Models for Generalist Robots
by: Li, Xinghang, et al.
Published: (2024)

OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models
by: Zhong, Lanfeng, et al.
Published: (2025)

VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
by: Zhong, Lanfeng, et al.
Published: (2024)

Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification
by: Sun, Xiangyu, et al.
Published: (2025)

Vision Generalist Model: A Survey
by: Wang, Ziyi, et al.
Published: (2025)