Saved in:
| Main Authors: | Ye, Hongfei, Chen, Bin, Liu, Wenxi, Zhang, Yu, Li, Zhao, Ni, Dandan, Chen, Hongyang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.11153 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
by: Chen, Zhihao, et al.
Published: (2023)
by: Chen, Zhihao, et al.
Published: (2023)
Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
by: Zhou, Zijie, et al.
Published: (2026)
by: Zhou, Zijie, et al.
Published: (2026)
PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
by: Meng, Yu, et al.
Published: (2025)
by: Meng, Yu, et al.
Published: (2025)
Evaluating Large Vision-language Models for Surgical Tool Detection
by: Poudel, Nakul, et al.
Published: (2026)
by: Poudel, Nakul, et al.
Published: (2026)
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
by: Xie, Shaoan, et al.
Published: (2025)
by: Xie, Shaoan, et al.
Published: (2025)
PyVision-RL: Forging Open Agentic Vision Models via RL
by: Zhao, Shitian, et al.
Published: (2026)
by: Zhao, Shitian, et al.
Published: (2026)
POINTS: Improving Your Vision-language Model with Affordable Strategies
by: Liu, Yuan, et al.
Published: (2024)
by: Liu, Yuan, et al.
Published: (2024)
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
by: Bao, Han, et al.
Published: (2024)
by: Bao, Han, et al.
Published: (2024)
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding
by: Fu, Teng, et al.
Published: (2025)
by: Fu, Teng, et al.
Published: (2025)
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
by: Huo, Fushuo, et al.
Published: (2024)
by: Huo, Fushuo, et al.
Published: (2024)
Physical Prompt Injection Attacks on Large Vision-Language Models
by: Ling, Chen, et al.
Published: (2026)
by: Ling, Chen, et al.
Published: (2026)
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
by: Chen, Xinyu, et al.
Published: (2025)
by: Chen, Xinyu, et al.
Published: (2025)
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
by: Zhao, Haozhe, et al.
Published: (2023)
by: Zhao, Haozhe, et al.
Published: (2023)
Large Vision-Language Models as Emotion Recognizers in Context Awareness
by: Lei, Yuxuan, et al.
Published: (2024)
by: Lei, Yuxuan, et al.
Published: (2024)
Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training
by: Cao, Weiwei, et al.
Published: (2025)
by: Cao, Weiwei, et al.
Published: (2025)
QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models
by: Zhang, Yudong, et al.
Published: (2025)
by: Zhang, Yudong, et al.
Published: (2025)
Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models
by: Liang, Yuxuan, et al.
Published: (2025)
by: Liang, Yuxuan, et al.
Published: (2025)
A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model
by: Zheng, Qi, et al.
Published: (2026)
by: Zheng, Qi, et al.
Published: (2026)
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
by: Chen, Jiuhai, et al.
Published: (2024)
by: Chen, Jiuhai, et al.
Published: (2024)
Trustworthy Large Models in Vision: A Survey
by: Guo, Ziyan, et al.
Published: (2023)
by: Guo, Ziyan, et al.
Published: (2023)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
by: Zhang, Zilun, et al.
Published: (2025)
by: Zhang, Zilun, et al.
Published: (2025)
Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout
by: QI, Anbin, et al.
Published: (2024)
by: QI, Anbin, et al.
Published: (2024)
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
by: Chen, Zhanpeng, et al.
Published: (2025)
by: Chen, Zhanpeng, et al.
Published: (2025)
RAU: Reference-based Anatomical Understanding with Vision Language Models
by: Li, Yiwei, et al.
Published: (2025)
by: Li, Yiwei, et al.
Published: (2025)
Delineating Knowledge Boundaries for Honest Large Vision-Language Models
by: Song, Junru, et al.
Published: (2026)
by: Song, Junru, et al.
Published: (2026)
White-box Multimodal Jailbreaks Against Large Vision-Language Models
by: Wang, Ruofan, et al.
Published: (2024)
by: Wang, Ruofan, et al.
Published: (2024)
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
by: Chen, Ying, et al.
Published: (2024)
by: Chen, Ying, et al.
Published: (2024)
VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025)
by: Zhao, Hongbo, et al.
Published: (2025)
Assessing Privacy Preservation and Utility in Online Vision-Language Models
by: Chaudhari, Karmesh Siddharam, et al.
Published: (2026)
by: Chaudhari, Karmesh Siddharam, et al.
Published: (2026)
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
by: Li, Wenxi, et al.
Published: (2025)
by: Li, Wenxi, et al.
Published: (2025)
PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models
by: Zhang, Yongjian, et al.
Published: (2025)
by: Zhang, Yongjian, et al.
Published: (2025)
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
Attention Prompting on Image for Large Vision-Language Models
by: Yu, Runpeng, et al.
Published: (2024)
by: Yu, Runpeng, et al.
Published: (2024)
AddressVLM: Cross-view Alignment Tuning for Image Address Localization using Large Vision-Language Models
by: Xu, Shixiong, et al.
Published: (2025)
by: Xu, Shixiong, et al.
Published: (2025)
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
by: Huang, Wenxuan, et al.
Published: (2026)
by: Huang, Wenxuan, et al.
Published: (2026)
A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models
by: Nabahirwa, Edwine, et al.
Published: (2025)
by: Nabahirwa, Edwine, et al.
Published: (2025)
Subspace Alignment for Vision-Language Model Test-time Adaptation
by: Zeng, Zhichen, et al.
Published: (2026)
by: Zeng, Zhichen, et al.
Published: (2026)
Large Vision-Language Models Get Lost in Attention
by: Xi, Gongli, et al.
Published: (2026)
by: Xi, Gongli, et al.
Published: (2026)
Similar Items
-
IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
by: Chen, Zhihao, et al.
Published: (2023) -
Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
by: Zhou, Zijie, et al.
Published: (2026) -
PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models
by: Meng, Yu, et al.
Published: (2025) -
Evaluating Large Vision-language Models for Surgical Tool Detection
by: Poudel, Nakul, et al.
Published: (2026) -
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
by: Xie, Shaoan, et al.
Published: (2025)