Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	de Avalle, Guillermo Gil, Maruster, Laura, Emmanouilidis, Christos
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.22754
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910006016737280
author	de Avalle, Guillermo Gil Maruster, Laura Emmanouilidis, Christos
author_facet	de Avalle, Guillermo Gil Maruster, Laura Emmanouilidis, Christos
contents	Industrial troubleshooting guides encode diagnostic procedures in flowchart-like diagrams where spatial layout and technical language jointly convey meaning. To integrate this knowledge into operator support systems, which assist shop-floor personnel in diagnosing and resolving equipment issues, the information must first be extracted and structured for machine interpretation. However, when performed manually, this extraction is labor-intensive and error-prone. Vision Language Models offer potential to automate this process by jointly interpreting visual and textual meaning, yet their performance on such guides remains underexplored. This paper evaluates two VLMs on extracting structured knowledge, comparing two prompting strategies: standard instruction-guided versus an augmented approach that cues troubleshooting layout patterns. Results reveal model-specific trade-offs between layout sensitivity and semantic robustness, informing practical deployment decisions.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_22754
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models de Avalle, Guillermo Gil Maruster, Laura Emmanouilidis, Christos Computer Vision and Pattern Recognition Artificial Intelligence Industrial troubleshooting guides encode diagnostic procedures in flowchart-like diagrams where spatial layout and technical language jointly convey meaning. To integrate this knowledge into operator support systems, which assist shop-floor personnel in diagnosing and resolving equipment issues, the information must first be extracted and structured for machine interpretation. However, when performed manually, this extraction is labor-intensive and error-prone. Vision Language Models offer potential to automate this process by jointly interpreting visual and textual meaning, yet their performance on such guides remains underexplored. This paper evaluates two VLMs on extracting structured knowledge, comparing two prompting strategies: standard instruction-guided versus an augmented approach that cues troubleshooting layout patterns. Results reveal model-specific trade-offs between layout sensitivity and semantic robustness, informing practical deployment decisions.
title	Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2601.22754

Similar Items