Saved in:
| Main Authors: | Shi, Liang, Li, Wei, Beussman, Kevin M, Chen, Lin, Fu, Yun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.14188 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
U-VLM: Hierarchical Vision Language Modeling for Report Generation
by: Shi, Pengcheng, et al.
Published: (2026)
by: Shi, Pengcheng, et al.
Published: (2026)
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding
by: Shi, Liang, et al.
Published: (2024)
by: Shi, Liang, et al.
Published: (2024)
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
by: Shao, Rui, et al.
Published: (2025)
by: Shao, Rui, et al.
Published: (2025)
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024)
by: Tian, Xiaoyu, et al.
Published: (2024)
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025)
by: Li, Ling, et al.
Published: (2025)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models
by: Li, Yuliang, et al.
Published: (2026)
by: Li, Yuliang, et al.
Published: (2026)
VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
by: Kang, Shuhao, et al.
Published: (2026)
by: Kang, Shuhao, et al.
Published: (2026)
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
by: Shi, Jiangbo, et al.
Published: (2025)
by: Shi, Jiangbo, et al.
Published: (2025)
VLM-PAR: A Vision Language Model for Pedestrian Attribute Recognition
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)
by: Sellam, Abdellah Zakaria, et al.
Published: (2025)
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)
by: Wang, Changyuan, et al.
Published: (2024)
ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models
by: Sural, Shounak, et al.
Published: (2024)
by: Sural, Shounak, et al.
Published: (2024)
OmniVLM: A Token-Compressed, Sub-Billion-Parameter Vision-Language Model for Efficient On-Device Inference
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
HanMoVLM: Large Vision-Language Models for Professional Artistic Painting Evaluation
by: Yang, Hongji, et al.
Published: (2026)
by: Yang, Hongji, et al.
Published: (2026)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens
by: Sheta, Hala, et al.
Published: (2025)
by: Sheta, Hala, et al.
Published: (2025)
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
by: Huang, Zhipeng, et al.
Published: (2024)
by: Huang, Zhipeng, et al.
Published: (2024)
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
by: Zhu, William Yicheng, et al.
Published: (2024)
by: Zhu, William Yicheng, et al.
Published: (2024)
TrojVLM: Backdoor Attack Against Vision Language Models
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence
by: She, Chaoyin, et al.
Published: (2025)
by: She, Chaoyin, et al.
Published: (2025)
Immuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World Trustworthiness
by: Fang, Xiang, et al.
Published: (2026)
by: Fang, Xiang, et al.
Published: (2026)
FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
by: Liu, Yuanqing, et al.
Published: (2026)
by: Liu, Yuanqing, et al.
Published: (2026)
SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
by: Lim, Gyubeum, et al.
Published: (2025)
by: Lim, Gyubeum, et al.
Published: (2025)
VLM-UQBench: A Benchmark for Modality-Specific and Cross-Modality Uncertainties in Vision Language Models
by: Wang, Chenyu, et al.
Published: (2026)
by: Wang, Chenyu, et al.
Published: (2026)
Towards Multimodal In-Context Learning for Vision & Language Models
by: Doveh, Sivan, et al.
Published: (2024)
by: Doveh, Sivan, et al.
Published: (2024)
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
by: Fan, Zhiwen, et al.
Published: (2025)
by: Fan, Zhiwen, et al.
Published: (2025)
A Text-Guided Vision Model for Enhanced Recognition of Small Instances
by: Jung, Hyun-Ki
Published: (2026)
by: Jung, Hyun-Ki
Published: (2026)
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)
by: Liu, Zheng, et al.
Published: (2024)
Visual In-Context Learning for Large Vision-Language Models
by: Zhou, Yucheng, et al.
Published: (2024)
by: Zhou, Yucheng, et al.
Published: (2024)
FastVLM: Efficient Vision Encoding for Vision Language Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
by: Danish, Muhammad Sohail, et al.
Published: (2024)
by: Danish, Muhammad Sohail, et al.
Published: (2024)
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
by: Chu, Xiangxiang, et al.
Published: (2023)
by: Chu, Xiangxiang, et al.
Published: (2023)
SpaceVLM: Sub-Space Modeling of Negation in Vision-Language Models
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)
IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting
by: Fu, Hao, et al.
Published: (2025)
by: Fu, Hao, et al.
Published: (2025)
MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
by: Wang, Kuo, et al.
Published: (2024)
by: Wang, Kuo, et al.
Published: (2024)
Shape and Texture Recognition in Large Vision-Language Models
by: Eppel, Sagi, et al.
Published: (2025)
by: Eppel, Sagi, et al.
Published: (2025)
OViP: Online Vision-Language Preference Learning for VLM Hallucination
by: Liu, Shujun, et al.
Published: (2025)
by: Liu, Shujun, et al.
Published: (2025)
Revisiting Data Auditing in Large Vision-Language Models
by: Zhu, Hongyu, et al.
Published: (2025)
by: Zhu, Hongyu, et al.
Published: (2025)
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
by: Zhong, Siru, et al.
Published: (2025)
by: Zhong, Siru, et al.
Published: (2025)
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
by: Sun, Fan-Yun, et al.
Published: (2024)
by: Sun, Fan-Yun, et al.
Published: (2024)
Similar Items
-
U-VLM: Hierarchical Vision Language Modeling for Report Generation
by: Shi, Pengcheng, et al.
Published: (2026) -
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding
by: Shi, Liang, et al.
Published: (2024) -
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
by: Shao, Rui, et al.
Published: (2025) -
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024) -
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025)