:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	de Avalle, Guillermo Gil, Maruster, Laura, Sloot, Eric, Emmanouilidis, Christos
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.06770
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models
by: de Avalle, Guillermo Gil, et al.
Published: (2026)

FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
by: Pan, Huitong, et al.
Published: (2024)

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding
by: Radwan, Ahmed Y., et al.
Published: (2026)

JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models
by: Sasaki, Hiroshi
Published: (2026)

EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering
by: Dou, Zhifei, et al.
Published: (2026)

First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models
by: Zhang, Enming, et al.
Published: (2024)

An Online Reference-Free Evaluation Framework for Flowchart Image-to-Code Generation
by: Nguyen, Giang Son, et al.
Published: (2026)

Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
by: Omasa, Takamitsu, et al.
Published: (2025)

Towards Making Flowchart Images Machine Interpretable
by: Shukla, Shreya, et al.
Published: (2025)

Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)

Guiding Video Prediction with Explicit Procedural Knowledge
by: Takenaka, Patrick, et al.
Published: (2024)

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts
by: Zhang, Ziyi, et al.
Published: (2025)

Separating Knowledge and Perception with Procedural Data
by: Rodríguez-Muñoz, Adrián, et al.
Published: (2025)

Is There Knowledge Left to Extract? Evidence of Fragility in Medically Fine-Tuned Vision-Language Models
by: McLaughlin, Oliver, et al.
Published: (2026)

Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering
by: Makiyeh, Fouad, et al.
Published: (2024)

Procedural terrain generation with style transfer
by: Merizzi, Fabio
Published: (2024)

FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation
by: Bill, Eric Tillmann, et al.
Published: (2026)

Detection-Fusion for Knowledge Graph Extraction from Videos
by: Das, Taniya, et al.
Published: (2024)

Learning Robust Intervention Representations with Delta Embeddings
by: Alimisis, Panagiotis, et al.
Published: (2025)

Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring
by: Telegraph, Kristina, et al.
Published: (2024)

Motion-Boundary-Driven Unsupervised Surgical Instrument Segmentation in Low-Quality Optical Flow
by: Liu, Yang, et al.
Published: (2024)

ViPro: Enabling and Controlling Video Prediction for Complex Dynamical Scenarios using Procedural Knowledge
by: Takenaka, Patrick, et al.
Published: (2024)

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
by: Assouel, Rim, et al.
Published: (2026)

Less is More: Label-Guided Summarization of Procedural and Instructional Videos
by: Rajpal, Shreya, et al.
Published: (2026)

SceneX: Procedural Controllable Large-scale Scene Generation
by: Zhou, Mengqi, et al.
Published: (2024)

Enhanced Cascade Prostate Cancer Classifier in mp-MRI Utilizing Recall Feedback Adaptive Loss and Prior Knowledge-Based Feature Extraction
by: Luo, Kun, et al.
Published: (2024)

CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
by: Chen, Shuhang, et al.
Published: (2026)

Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
by: Medeiros, Heitor Rapela, et al.
Published: (2024)

MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence
by: Chen, Yifan, et al.
Published: (2026)

CityX: Controllable Procedural Content Generation for Unbounded 3D Cities
by: Zhang, Shougao, et al.
Published: (2024)

Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns
by: Kyrkou, Christos
Published: (2024)

FACE: Faithful Automatic Concept Extraction
by: Bhusal, Dipkamal, et al.
Published: (2025)

Sharingan: Extract User Action Sequence from Desktop Recordings
by: Chen, Yanting, et al.
Published: (2024)

IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly
by: Wen, Di, et al.
Published: (2026)

ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding
by: Wang, Xucheng, et al.
Published: (2026)

Designing and Generating Diverse, Equitable Face Image Datasets for Face Verification Tasks
by: Baltsou, Georgia, et al.
Published: (2025)

Personalized Federated Learning for Cross-view Geo-localization
by: Anagnostopoulos, Christos, et al.
Published: (2024)

Masked Generative Story Transformer with Character Guidance and Caption Augmentation
by: Papadimitriou, Christos, et al.
Published: (2024)

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
by: Huang, Yushi, et al.
Published: (2023)

A Stitch in Time: Learning Procedural Workflow via Self-Supervised Plackett-Luce Ranking
by: Che, Chengan, et al.
Published: (2025)