Saved in:
| Main Authors: | de Avalle, Guillermo Gil, Maruster, Laura, Emmanouilidis, Christos |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.22754 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts
by: de Avalle, Guillermo Gil, et al.
Published: (2026)
by: de Avalle, Guillermo Gil, et al.
Published: (2026)
SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding
by: Radwan, Ahmed Y., et al.
Published: (2026)
by: Radwan, Ahmed Y., et al.
Published: (2026)
Generative AI for Industrial Contour Detection: A Language-Guided Vision System
by: Gong, Liang, et al.
Published: (2025)
by: Gong, Liang, et al.
Published: (2025)
Target Prompting for Information Extraction with Vision Language Model
by: Medhi, Dipankar
Published: (2024)
by: Medhi, Dipankar
Published: (2024)
Guiding Video Prediction with Explicit Procedural Knowledge
by: Takenaka, Patrick, et al.
Published: (2024)
by: Takenaka, Patrick, et al.
Published: (2024)
Research on Vision-Language Question Answering Models for Industrial Robots
by: Li, Ping, et al.
Published: (2026)
by: Li, Ping, et al.
Published: (2026)
Language-Guided Invariance Probing of Vision-Language Models
by: Lee, Jae Joong
Published: (2025)
by: Lee, Jae Joong
Published: (2025)
FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph
by: Liu, Zhangding, et al.
Published: (2025)
by: Liu, Zhangding, et al.
Published: (2025)
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models
by: Roberts, Josselin Somerville, et al.
Published: (2024)
by: Roberts, Josselin Somerville, et al.
Published: (2024)
Effective Damage Data Generation by Fusing Imagery with Human Knowledge Using Vision-Language Models
by: Wei, Jie, et al.
Published: (2025)
by: Wei, Jie, et al.
Published: (2025)
Ego: Embedding-Guided Personalization of Vision-Language Models
by: Seifi, Soroush, et al.
Published: (2026)
by: Seifi, Soroush, et al.
Published: (2026)
Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction
by: Khan, Muhammad Tayyab, et al.
Published: (2024)
by: Khan, Muhammad Tayyab, et al.
Published: (2024)
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
by: Zhang, Shengxuming, et al.
Published: (2024)
by: Zhang, Shengxuming, et al.
Published: (2024)
Are Large Vision-Language Models Ready to Guide Blind and Low-Vision Individuals?
by: Kim, Eunki, et al.
Published: (2025)
by: Kim, Eunki, et al.
Published: (2025)
Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers
by: Padalkar, Parth, et al.
Published: (2025)
by: Padalkar, Parth, et al.
Published: (2025)
VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models
by: Si, Shengyu, et al.
Published: (2026)
by: Si, Shengyu, et al.
Published: (2026)
Delineating Knowledge Boundaries for Honest Large Vision-Language Models
by: Song, Junru, et al.
Published: (2026)
by: Song, Junru, et al.
Published: (2026)
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
by: Maeda, Koki, et al.
Published: (2024)
by: Maeda, Koki, et al.
Published: (2024)
Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems
by: Hsu, YuChe, et al.
Published: (2025)
by: Hsu, YuChe, et al.
Published: (2025)
DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models
by: Shi, Zhiyi, et al.
Published: (2025)
by: Shi, Zhiyi, et al.
Published: (2025)
IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly
by: Wen, Di, et al.
Published: (2026)
by: Wen, Di, et al.
Published: (2026)
ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models
by: Yang, Cheng, et al.
Published: (2026)
by: Yang, Cheng, et al.
Published: (2026)
When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models
by: Ortu, Francesco, et al.
Published: (2025)
by: Ortu, Francesco, et al.
Published: (2025)
Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models
by: Yu, Bo, et al.
Published: (2026)
by: Yu, Bo, et al.
Published: (2026)
On the Utility of Foundation Models for Fast MRI: Vision-Language-Guided Image Reconstruction
by: Feng, Ruimin, et al.
Published: (2025)
by: Feng, Ruimin, et al.
Published: (2025)
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
by: Xie, Yuxi, et al.
Published: (2024)
by: Xie, Yuxi, et al.
Published: (2024)
Less is More: Label-Guided Summarization of Procedural and Instructional Videos
by: Rajpal, Shreya, et al.
Published: (2026)
by: Rajpal, Shreya, et al.
Published: (2026)
Sanitizing Manufacturing Dataset Labels Using Vision-Language Models
by: Mahjourian, Nazanin, et al.
Published: (2025)
by: Mahjourian, Nazanin, et al.
Published: (2025)
LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models
by: Gkalelis, Nikolaos, et al.
Published: (2026)
by: Gkalelis, Nikolaos, et al.
Published: (2026)
Is There Knowledge Left to Extract? Evidence of Fragility in Medically Fine-Tuned Vision-Language Models
by: McLaughlin, Oliver, et al.
Published: (2026)
by: McLaughlin, Oliver, et al.
Published: (2026)
Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models
by: Li, Shaotian, et al.
Published: (2026)
by: Li, Shaotian, et al.
Published: (2026)
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
by: Feng, Qianhan, et al.
Published: (2024)
by: Feng, Qianhan, et al.
Published: (2024)
Revisiting KRISP: A Lightweight Reproduction and Analysis of Knowledge-Enhanced Vision-Language Models
by: Dutta, Souradeep, et al.
Published: (2025)
by: Dutta, Souradeep, et al.
Published: (2025)
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
by: Zhang, Shu-Hao, et al.
Published: (2025)
by: Zhang, Shu-Hao, et al.
Published: (2025)
KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models
by: Nguyen, Son Hai, et al.
Published: (2025)
by: Nguyen, Son Hai, et al.
Published: (2025)
SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
by: Zhao, Ruosen, et al.
Published: (2025)
by: Zhao, Ruosen, et al.
Published: (2025)
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
by: Zhang, Naifu, et al.
Published: (2025)
by: Zhang, Naifu, et al.
Published: (2025)
AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models
by: Li, Yue, et al.
Published: (2026)
by: Li, Yue, et al.
Published: (2026)
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
Similar Items
-
FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts
by: de Avalle, Guillermo Gil, et al.
Published: (2026) -
SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding
by: Radwan, Ahmed Y., et al.
Published: (2026) -
Generative AI for Industrial Contour Detection: A Language-Guided Vision System
by: Gong, Liang, et al.
Published: (2025) -
Target Prompting for Information Extraction with Vision Language Model
by: Medhi, Dipankar
Published: (2024) -
Guiding Video Prediction with Explicit Procedural Knowledge
by: Takenaka, Patrick, et al.
Published: (2024)