:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yaru, Wang, Yanxue, Li, Meng, Li, Xinming, Feng, Jianbo
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.10394
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
by: Song, Meiyue, et al.
Published: (2023)

Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation
by: Li, Qiming, et al.
Published: (2025)

Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models
by: Zhang, Yang, et al.
Published: (2025)

Semantic Alignment for Multimodal Large Language Models
by: Wu, Tao, et al.
Published: (2024)

Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
by: Wang, Lehan, et al.
Published: (2024)

LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation
by: Li, Zhenshi, et al.
Published: (2024)

Wind Turbine Feature Detection Using Deep Learning and Synthetic Data
by: Shahirpour, Arash, et al.
Published: (2025)

Barely-Visible Surface Crack Detection for Wind Turbine Sustainability
by: Agrawal, Sourav, et al.
Published: (2024)

Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models
by: Patrício, Cristiano, et al.
Published: (2023)

ST-LLM: Large Language Models Are Effective Temporal Learners
by: Liu, Ruyang, et al.
Published: (2024)

Leveraging Vision-Language Large Models for Interpretable Video Action Recognition with Semantic Tokenization
by: Peng, Jingwei, et al.
Published: (2025)

Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis
by: Qin, Yi, et al.
Published: (2026)

Hierarchically-Structured Open-Vocabulary Indoor Scene Synthesis with Pre-trained Large Language Model
by: Sun, Weilin, et al.
Published: (2025)

An Exploratory Study on Abstract Images and Visual Representations Learned from Them
by: Li, Haotian, et al.
Published: (2025)

TrajGATFormer: A Graph-Based Transformer Approach for Worker and Obstacle Trajectory Prediction in Off-site Construction Environments
by: Alduais, Mohammed, et al.
Published: (2025)

Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
by: Wang, Meng, et al.
Published: (2025)

SVGen: Interpretable Vector Graphics Generation with Large Language Models
by: Wang, Feiyu, et al.
Published: (2025)

VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion
by: Wang, Meng, et al.
Published: (2025)

Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
by: Luan, Bozhi, et al.
Published: (2025)

Large Language Model Aided Birt-Hogg-Dube Syndrome Diagnosis with Multimodal Retrieval-Augmented Generation
by: Li, Haoqing, et al.
Published: (2025)

Towards Training-free Multimodal Hate Localisation with Large Language Models
by: Sun, Yueming, et al.
Published: (2026)

LLM-AD: Large Language Model based Audio Description System
by: Chu, Peng, et al.
Published: (2024)

HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
by: Wang, Haoyu, et al.
Published: (2026)

LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)

KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model
by: Yang, Jie, et al.
Published: (2025)

Conditional Evidence Reconstruction and Decomposition for Interpretable Multimodal Diagnosis
by: Wan, Shaowen, et al.
Published: (2026)

CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability
by: Zhang, Xi, et al.
Published: (2024)

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
by: Yao, Linli, et al.
Published: (2024)

Bridge the Points: Graph-based Few-shot Segment Anything Semantically
by: Zhang, Anqi, et al.
Published: (2024)

LangBridge: Interpreting Image as a Combination of Language Embeddings
by: Liao, Jiaqi, et al.
Published: (2025)

Language Model Guided Interpretable Video Action Reasoning
by: Wang, Ning, et al.
Published: (2024)

Test-time Sparsity for Extreme Fast Action Diffusion
by: Ji, Kangye, et al.
Published: (2026)

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
by: Liu, Tengfei, et al.
Published: (2024)

Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles
by: Xie, Jun, et al.
Published: (2025)

Identification of Surface Defects on Solar PV Panels and Wind Turbine Blades using Attention based Deep Learning Model
by: Dwivedi, Divyanshi, et al.
Published: (2022)

CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
by: Yao, Ziyu, et al.
Published: (2025)

Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
by: Li, Haosen, et al.
Published: (2026)

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
by: Lian, Long, et al.
Published: (2023)

PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
by: Zheng, Xiaoyun, et al.
Published: (2024)