Saved in:
| Main Authors: | Li, Yaru, Wang, Yanxue, Li, Meng, Li, Xinming, Feng, Jianbo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.10394 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
by: Song, Meiyue, et al.
Published: (2023)
by: Song, Meiyue, et al.
Published: (2023)
Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation
by: Li, Qiming, et al.
Published: (2025)
by: Li, Qiming, et al.
Published: (2025)
Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
Semantic Alignment for Multimodal Large Language Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)
by: Zhang, Jianing, et al.
Published: (2026)
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
by: Wang, Lehan, et al.
Published: (2024)
by: Wang, Lehan, et al.
Published: (2024)
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation
by: Li, Zhenshi, et al.
Published: (2024)
by: Li, Zhenshi, et al.
Published: (2024)
Wind Turbine Feature Detection Using Deep Learning and Synthetic Data
by: Shahirpour, Arash, et al.
Published: (2025)
by: Shahirpour, Arash, et al.
Published: (2025)
Barely-Visible Surface Crack Detection for Wind Turbine Sustainability
by: Agrawal, Sourav, et al.
Published: (2024)
by: Agrawal, Sourav, et al.
Published: (2024)
Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models
by: Patrício, Cristiano, et al.
Published: (2023)
by: Patrício, Cristiano, et al.
Published: (2023)
ST-LLM: Large Language Models Are Effective Temporal Learners
by: Liu, Ruyang, et al.
Published: (2024)
by: Liu, Ruyang, et al.
Published: (2024)
Leveraging Vision-Language Large Models for Interpretable Video Action Recognition with Semantic Tokenization
by: Peng, Jingwei, et al.
Published: (2025)
by: Peng, Jingwei, et al.
Published: (2025)
Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis
by: Qin, Yi, et al.
Published: (2026)
by: Qin, Yi, et al.
Published: (2026)
Hierarchically-Structured Open-Vocabulary Indoor Scene Synthesis with Pre-trained Large Language Model
by: Sun, Weilin, et al.
Published: (2025)
by: Sun, Weilin, et al.
Published: (2025)
An Exploratory Study on Abstract Images and Visual Representations Learned from Them
by: Li, Haotian, et al.
Published: (2025)
by: Li, Haotian, et al.
Published: (2025)
TrajGATFormer: A Graph-Based Transformer Approach for Worker and Obstacle Trajectory Prediction in Off-site Construction Environments
by: Alduais, Mohammed, et al.
Published: (2025)
by: Alduais, Mohammed, et al.
Published: (2025)
Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
by: Wang, Meng, et al.
Published: (2025)
by: Wang, Meng, et al.
Published: (2025)
SVGen: Interpretable Vector Graphics Generation with Large Language Models
by: Wang, Feiyu, et al.
Published: (2025)
by: Wang, Feiyu, et al.
Published: (2025)
VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion
by: Wang, Meng, et al.
Published: (2025)
by: Wang, Meng, et al.
Published: (2025)
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
by: Luan, Bozhi, et al.
Published: (2025)
by: Luan, Bozhi, et al.
Published: (2025)
Large Language Model Aided Birt-Hogg-Dube Syndrome Diagnosis with Multimodal Retrieval-Augmented Generation
by: Li, Haoqing, et al.
Published: (2025)
by: Li, Haoqing, et al.
Published: (2025)
Towards Training-free Multimodal Hate Localisation with Large Language Models
by: Sun, Yueming, et al.
Published: (2026)
by: Sun, Yueming, et al.
Published: (2026)
LLM-AD: Large Language Model based Audio Description System
by: Chu, Peng, et al.
Published: (2024)
by: Chu, Peng, et al.
Published: (2024)
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
by: Wang, Haoyu, et al.
Published: (2026)
by: Wang, Haoyu, et al.
Published: (2026)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)
by: Stan, Gabriela Ben Melech, et al.
Published: (2024)
KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model
by: Yang, Jie, et al.
Published: (2025)
by: Yang, Jie, et al.
Published: (2025)
Conditional Evidence Reconstruction and Decomposition for Interpretable Multimodal Diagnosis
by: Wan, Shaowen, et al.
Published: (2026)
by: Wan, Shaowen, et al.
Published: (2026)
CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability
by: Zhang, Xi, et al.
Published: (2024)
by: Zhang, Xi, et al.
Published: (2024)
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
by: Yao, Linli, et al.
Published: (2024)
by: Yao, Linli, et al.
Published: (2024)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically
by: Zhang, Anqi, et al.
Published: (2024)
by: Zhang, Anqi, et al.
Published: (2024)
LangBridge: Interpreting Image as a Combination of Language Embeddings
by: Liao, Jiaqi, et al.
Published: (2025)
by: Liao, Jiaqi, et al.
Published: (2025)
Language Model Guided Interpretable Video Action Reasoning
by: Wang, Ning, et al.
Published: (2024)
by: Wang, Ning, et al.
Published: (2024)
Test-time Sparsity for Extreme Fast Action Diffusion
by: Ji, Kangye, et al.
Published: (2026)
by: Ji, Kangye, et al.
Published: (2026)
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
by: Liu, Tengfei, et al.
Published: (2024)
by: Liu, Tengfei, et al.
Published: (2024)
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles
by: Xie, Jun, et al.
Published: (2025)
by: Xie, Jun, et al.
Published: (2025)
Identification of Surface Defects on Solar PV Panels and Wind Turbine Blades using Attention based Deep Learning Model
by: Dwivedi, Divyanshi, et al.
Published: (2022)
by: Dwivedi, Divyanshi, et al.
Published: (2022)
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
by: Yao, Ziyu, et al.
Published: (2025)
by: Yao, Ziyu, et al.
Published: (2025)
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
by: Li, Haosen, et al.
Published: (2026)
by: Li, Haosen, et al.
Published: (2026)
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
by: Lian, Long, et al.
Published: (2023)
by: Lian, Long, et al.
Published: (2023)
PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
by: Zheng, Xiaoyun, et al.
Published: (2024)
by: Zheng, Xiaoyun, et al.
Published: (2024)
Similar Items
-
PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
by: Song, Meiyue, et al.
Published: (2023) -
Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation
by: Li, Qiming, et al.
Published: (2025) -
Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models
by: Zhang, Yang, et al.
Published: (2025) -
Semantic Alignment for Multimodal Large Language Models
by: Wu, Tao, et al.
Published: (2024) -
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)