Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Jiajie, Xu, Chenhui, Liu, Meihuan, Xiong, Jinjun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.20116
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914411671715840
author	Li, Jiajie Xu, Chenhui Liu, Meihuan Xiong, Jinjun
author_facet	Li, Jiajie Xu, Chenhui Liu, Meihuan Xiong, Jinjun
contents	Conventional fine-tuning on domain-specific datasets can inadvertently alter a model's pretrained multimodal priors, leading to reduced generalization. To address this, we propose Chain-of-Adaptation (CoA), an adaptation framework designed to integrate domain knowledge while maintaining the model's inherent reasoning and perceptual capabilities. CoA introduces a structured reasoning format that enhances domain alignment without sacrificing general multimodal competence by reinforcement learning. Experiments on standard surgical benchmarks, under both in-distribution and out-of-distribution settings, demonstrate that CoA achieves higher accuracy, stronger generalization, and more stable behavior than supervised fine-tuning. Furthermore, ablation studies confirm that CoA effectively preserves the model's core visual-language abilities, providing a reliable pathway for domain specialization in VLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_20116
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning Li, Jiajie Xu, Chenhui Liu, Meihuan Xiong, Jinjun Computer Vision and Pattern Recognition Artificial Intelligence Conventional fine-tuning on domain-specific datasets can inadvertently alter a model's pretrained multimodal priors, leading to reduced generalization. To address this, we propose Chain-of-Adaptation (CoA), an adaptation framework designed to integrate domain knowledge while maintaining the model's inherent reasoning and perceptual capabilities. CoA introduces a structured reasoning format that enhances domain alignment without sacrificing general multimodal competence by reinforcement learning. Experiments on standard surgical benchmarks, under both in-distribution and out-of-distribution settings, demonstrate that CoA achieves higher accuracy, stronger generalization, and more stable behavior than supervised fine-tuning. Furthermore, ablation studies confirm that CoA effectively preserves the model's core visual-language abilities, providing a reliable pathway for domain specialization in VLMs.
title	Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2603.20116

Similar Items