:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Xuesong, Wang, Caisheng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.08069
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Improved Anomaly Detection Model for Automated Inspection of Power Line Insulators
by: Das, Laya, et al.
Published: (2023)

Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection
by: Wang, Jiahao, et al.
Published: (2024)

Integrating Artificial Intelligence Models and Synthetic Image Data for Enhanced Asset Inspection and Defect Identification
by: Mandati, Reddy, et al.
Published: (2024)

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
by: Zhang, Wenqi, et al.
Published: (2024)

Seeing the Evidence, Missing the Answer: Tool-Guided Vision-Language Models on Visual Illusions
by: Wang, Xuesong, et al.
Published: (2026)

Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
by: Yao, Yang, et al.
Published: (2025)

Model-Based Real-Time Pose and Sag Estimation of Overhead Power Lines Using LiDAR for Drone Inspection
by: Girard, Alexandre, et al.
Published: (2025)

MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
by: Liang, Qian, et al.
Published: (2025)

Large Language Models for Multimodal Deformable Image Registration
by: Ma, Mingrui, et al.
Published: (2024)

synth-dacl: Does Synthetic Defect Data Enhance Segmentation Accuracy and Robustness for Real-World Bridge Inspections?
by: Flotzinger, Johannes, et al.
Published: (2025)

DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
by: Song, Jaewoo, et al.
Published: (2025)

LLMGA: Multimodal Large Language Model based Generation Assistant
by: Xia, Bin, et al.
Published: (2023)

From Prediction to Diagnosis: Reasoning-Aware AI for Photovoltaic Defect Inspection
by: Mistry, Dev, et al.
Published: (2026)

UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection
by: Zhang, Huan, et al.
Published: (2026)

An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
by: Tan, Zhiyu, et al.
Published: (2024)

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models
by: Tan, Chuangchuang, et al.
Published: (2025)

AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks
by: Li, You, et al.
Published: (2024)

Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
by: Keita, Mamadou, et al.
Published: (2024)

Multimodal Large Language Models as Image Classifiers
by: Kisel, Nikita, et al.
Published: (2026)

VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
by: Yang, Jinze, et al.
Published: (2024)

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
by: Meng, Chutian, et al.
Published: (2024)

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
by: Wen, Siwei, et al.
Published: (2025)

Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)

An Incremental Unified Framework for Small Defect Inspection
by: Tang, Jiaqi, et al.
Published: (2023)

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
by: Zhang, Guosheng, et al.
Published: (2025)

Transmission Line Defect Detection Based on UAV Patrol Images and Vision-language Pretraining
by: Zhang, Ke, et al.
Published: (2024)

Fully-Synthetic Training for Visual Quality Inspection in Automotive Production
by: Huber, Christoph, et al.
Published: (2025)

Guiding Instruction-based Image Editing via Multimodal Large Language Models
by: Fu, Tsu-Jui, et al.
Published: (2023)

Safety of Multimodal Large Language Models on Images and Texts
by: Liu, Xin, et al.
Published: (2024)

Enhancing Power Grid Inspections with Machine Learning
by: Lavado, Diogo, et al.
Published: (2025)

MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios?
by: Dai, Shiqi, et al.
Published: (2025)

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models
by: Gao, Yifei, et al.
Published: (2024)

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
by: Tian, Ye, et al.
Published: (2025)

ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection
by: Huang, Tai-Ming, et al.
Published: (2025)

Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
by: Zhao, Henry Hengyuan, et al.
Published: (2023)

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
by: Li, Junxian, et al.
Published: (2024)

Grounding Everything in Tokens for Multimodal Large Language Models
by: Ren, Xiangxuan, et al.
Published: (2025)

Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models
by: Sharma, Pranav, et al.
Published: (2025)

Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
by: Li, Mingxing, et al.
Published: (2025)

Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
by: Liu, Shih-Wen, et al.
Published: (2025)