Saved in:
| Main Authors: | Shi, Pengcheng, Zhang, Minghui, Song, Kehan, Liu, Jiaqi, Gu, Yun, Zhang, Xinglin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.00479 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Medal S: Spatio-Textual Prompt Model for Medical Segmentation
by: Shi, Pengcheng, et al.
Published: (2025)
by: Shi, Pengcheng, et al.
Published: (2025)
IIR-VLM: In-Context Instance-level Recognition for Large Vision-Language Models
by: Shi, Liang, et al.
Published: (2026)
by: Shi, Liang, et al.
Published: (2026)
VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
by: Kang, Shuhao, et al.
Published: (2026)
by: Kang, Shuhao, et al.
Published: (2026)
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024)
by: Tian, Xiaoyu, et al.
Published: (2024)
TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy
by: Liu, Yaoyu, et al.
Published: (2026)
by: Liu, Yaoyu, et al.
Published: (2026)
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
by: Sun, Fan-Yun, et al.
Published: (2024)
by: Sun, Fan-Yun, et al.
Published: (2024)
HieroAction: Hierarchically Guided VLM for Fine-Grained Action Analysis
by: Wu, Junhao, et al.
Published: (2025)
by: Wu, Junhao, et al.
Published: (2025)
Hierarchical Semantic Learning for Multi-Class Aorta Segmentation
by: Shi, Pengcheng
Published: (2025)
by: Shi, Pengcheng
Published: (2025)
PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models
by: Li, Yuliang, et al.
Published: (2026)
by: Li, Yuliang, et al.
Published: (2026)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding
by: Shi, Liang, et al.
Published: (2024)
by: Shi, Liang, et al.
Published: (2024)
TopoVST: Toward Topology-fidelitous Vessel Skeleton Tracking
by: Liu, Yaoyu, et al.
Published: (2026)
by: Liu, Yaoyu, et al.
Published: (2026)
FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
by: Liu, Yuanqing, et al.
Published: (2026)
by: Liu, Yuanqing, et al.
Published: (2026)
GeoWorld-VLM: Geometry from World Models for Vision-Language Models
by: Gu, Renjie, et al.
Published: (2026)
by: Gu, Renjie, et al.
Published: (2026)
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
by: Shao, Rui, et al.
Published: (2025)
by: Shao, Rui, et al.
Published: (2025)
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
by: Huang, Zhipeng, et al.
Published: (2024)
by: Huang, Zhipeng, et al.
Published: (2024)
Slot-VLM: SlowFast Slots for Video-Language Modeling
by: Xu, Jiaqi, et al.
Published: (2024)
by: Xu, Jiaqi, et al.
Published: (2024)
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification
by: Zhong, Lanfeng, et al.
Published: (2024)
by: Zhong, Lanfeng, et al.
Published: (2024)
HOG-Layout: Hierarchical 3D Scene Generation, Optimization and Editing via Vision-Language Models
by: Jiang, Haiyan, et al.
Published: (2026)
by: Jiang, Haiyan, et al.
Published: (2026)
WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
by: Yuan, Zhiqiang, et al.
Published: (2024)
by: Yuan, Zhiqiang, et al.
Published: (2024)
Vision Transformers with Hierarchical Attention
by: Liu, Yun, et al.
Published: (2021)
by: Liu, Yun, et al.
Published: (2021)
TrojVLM: Backdoor Attack Against Vision Language Models
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
by: Danish, Muhammad Sohail, et al.
Published: (2024)
by: Danish, Muhammad Sohail, et al.
Published: (2024)
SpaceVLM: Sub-Space Modeling of Negation in Vision-Language Models
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)
by: Ranjbar, Sepehr Kazemi, et al.
Published: (2025)
Shape-aware Sampling Matters in the Modeling of Multi-Class Tubular Structures
by: Zhang, Minghui, et al.
Published: (2025)
by: Zhang, Minghui, et al.
Published: (2025)
SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models
by: Makarov, Vladislav, et al.
Published: (2026)
by: Makarov, Vladislav, et al.
Published: (2026)
Distribution-Based Masked Medical Vision-Language Model Using Structured Reports
by: Gowda, Shreyank N, et al.
Published: (2025)
by: Gowda, Shreyank N, et al.
Published: (2025)
TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation
by: Liu, Jiaxing, et al.
Published: (2026)
by: Liu, Jiaxing, et al.
Published: (2026)
Fairness-Aware Fine-Tuning of Vision-Language Models for Medical Glaucoma Diagnosis
by: Gu, Zijian, et al.
Published: (2025)
by: Gu, Zijian, et al.
Published: (2025)
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
by: Shen, Haozhan, et al.
Published: (2025)
by: Shen, Haozhan, et al.
Published: (2025)
CogVLM2: Visual Language Models for Image and Video Understanding
by: Hong, Wenyi, et al.
Published: (2024)
by: Hong, Wenyi, et al.
Published: (2024)
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)
by: Wang, Changyuan, et al.
Published: (2024)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
by: Liu, Hanqing, et al.
Published: (2026)
by: Liu, Hanqing, et al.
Published: (2026)
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
by: Wu, Sifan, et al.
Published: (2024)
by: Wu, Sifan, et al.
Published: (2024)
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
by: Chu, Xiangxiang, et al.
Published: (2024)
by: Chu, Xiangxiang, et al.
Published: (2024)
Similar Items
-
Medal S: Spatio-Textual Prompt Model for Medical Segmentation
by: Shi, Pengcheng, et al.
Published: (2025) -
IIR-VLM: In-Context Instance-level Recognition for Large Vision-Language Models
by: Shi, Liang, et al.
Published: (2026) -
VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
by: Kang, Shuhao, et al.
Published: (2026) -
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024) -
TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy
by: Liu, Yaoyu, et al.
Published: (2026)