Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Lijun, Chen, Linwei, Zhang, Zhishou, Tian, Meng, Cui, Hengfu, Li, Ruiyang, Liu, Zhaocheng, Ju, Qiang, Li, Qianxi, Zhou, Hong-Yu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.09136
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908763769798656
author	Liu, Lijun Chen, Linwei Zhang, Zhishou Tian, Meng Cui, Hengfu Li, Ruiyang Liu, Zhaocheng Ju, Qiang Li, Qianxi Zhou, Hong-Yu
author_facet	Liu, Lijun Chen, Linwei Zhang, Zhishou Tian, Meng Cui, Hengfu Li, Ruiyang Liu, Zhaocheng Ju, Qiang Li, Qianxi Zhou, Hong-Yu
contents	General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to "unfold" complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_09136
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL Liu, Lijun Chen, Linwei Zhang, Zhishou Tian, Meng Cui, Hengfu Li, Ruiyang Liu, Zhaocheng Ju, Qiang Li, Qianxi Zhou, Hong-Yu Computer Vision and Pattern Recognition Artificial Intelligence General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to "unfold" complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.
title	SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2601.09136

Similar Items