Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Qinghui, Zhang, Zekai, Zhang, Zaigui, Zhang, Kai, Li, Dagang, Wang, Wenmin, Zhang, Jinglin, Liu, Cong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.26735
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912985460506624
author	Chen, Qinghui Zhang, Zekai Zhang, Zaigui Zhang, Kai Li, Dagang Wang, Wenmin Zhang, Jinglin Liu, Cong
author_facet	Chen, Qinghui Zhang, Zekai Zhang, Zaigui Zhang, Kai Li, Dagang Wang, Wenmin Zhang, Jinglin Liu, Cong
contents	High inter-class similarity, extreme scale variation, and limited computational budgets hinder reliable visual recognition across diverse real-world data. Existing vision-centric and cross-modal approaches often rely on rigid fusion mechanisms and heavy annotation pipelines, leading to sub-optimal generalization. We propose the Distilled Large Language Model (LLM)-Driven Sparse Mixture-of-Experts (DS-MoE) framework, which integrates text-guided dynamic routing and lightweight multi-scale comprehension. The DS-MoE framework dynamically aligns textual semantics with defect-specific visual patterns through a sparse MoE architecture, where task-relevant experts are adaptively activated based on semantic relevance, resolving inter-class ambiguity. A lightweight MobileSAM encoder enables real-time inference while preserving multi-scale defect details. Extensive experiments on PCB, aluminum foil, and mold defect datasets demonstrate that our framework achieves superior performance compared to existing pure vision models. \textbf{DS-MoE} surpasses YOLOv8/YOLOX with gains of +13.9, +1.4, and +2.0 pp mAP@ 0.5:0.95 on BBMP, aluminum, and PCB, respectively, while also improving precision and recall.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_26735
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Distilled Large Language Model-Driven Dynamic Sparse Expert Activation Mechanism Chen, Qinghui Zhang, Zekai Zhang, Zaigui Zhang, Kai Li, Dagang Wang, Wenmin Zhang, Jinglin Liu, Cong Computer Vision and Pattern Recognition Artificial Intelligence High inter-class similarity, extreme scale variation, and limited computational budgets hinder reliable visual recognition across diverse real-world data. Existing vision-centric and cross-modal approaches often rely on rigid fusion mechanisms and heavy annotation pipelines, leading to sub-optimal generalization. We propose the Distilled Large Language Model (LLM)-Driven Sparse Mixture-of-Experts (DS-MoE) framework, which integrates text-guided dynamic routing and lightweight multi-scale comprehension. The DS-MoE framework dynamically aligns textual semantics with defect-specific visual patterns through a sparse MoE architecture, where task-relevant experts are adaptively activated based on semantic relevance, resolving inter-class ambiguity. A lightweight MobileSAM encoder enables real-time inference while preserving multi-scale defect details. Extensive experiments on PCB, aluminum foil, and mold defect datasets demonstrate that our framework achieves superior performance compared to existing pure vision models. \textbf{DS-MoE} surpasses YOLOv8/YOLOX with gains of +13.9, +1.4, and +2.0 pp mAP@ 0.5:0.95 on BBMP, aluminum, and PCB, respectively, while also improving precision and recall.
title	Distilled Large Language Model-Driven Dynamic Sparse Expert Activation Mechanism
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2603.26735

Similar Items