Saved in:
| Main Authors: | Chen, Ce, Ren, Yi, Li, Yuanming, Goriachko, Viktor, Ye, Zhenhui, Guo, Zujin, Hong, Zhibin, Gong, Mingming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.27975 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generate Your Talking Avatar from Video Reference
by: Guo, Zujin, et al.
Published: (2026)
by: Guo, Zujin, et al.
Published: (2026)
Navigation with VLM framework: Towards Going to Any Language
by: Yin, Zecheng, et al.
Published: (2024)
by: Yin, Zecheng, et al.
Published: (2024)
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
by: Guo, Yuliang, et al.
Published: (2025)
by: Guo, Yuliang, et al.
Published: (2025)
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning
by: Zhang, Ce, et al.
Published: (2024)
by: Zhang, Ce, et al.
Published: (2024)
TinyVLM: Zero-Shot Object Detection on Microcontrollers via Vision-Language Distillation with Matryoshka Embeddings
by: Wilson, Bibin
Published: (2026)
by: Wilson, Bibin
Published: (2026)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
by: Danish, Muhammad Sohail, et al.
Published: (2024)
by: Danish, Muhammad Sohail, et al.
Published: (2024)
MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations
by: Wu, Jiang, et al.
Published: (2025)
by: Wu, Jiang, et al.
Published: (2025)
Exosomes: From Non‐Invasive Detection to Engineered Targeted Therapy
by: Mingming Sun, et al.
Published: (2026)
by: Mingming Sun, et al.
Published: (2026)
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
by: Li, Juncheng, et al.
Published: (2025)
by: Li, Juncheng, et al.
Published: (2025)
GraphVLM: Benchmarking Vision Language Models for Multimodal Graph Learning
by: Liu, Jiajin, et al.
Published: (2026)
by: Liu, Jiajin, et al.
Published: (2026)
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
by: Ahn, Sunghyun, et al.
Published: (2025)
by: Ahn, Sunghyun, et al.
Published: (2025)
ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models
by: Sural, Shounak, et al.
Published: (2024)
by: Sural, Shounak, et al.
Published: (2024)
HUGE-Bench: A Benchmark for High-Level UAV Vision-Language-Action Tasks
by: Guo, Jingyu, et al.
Published: (2026)
by: Guo, Jingyu, et al.
Published: (2026)
SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
by: Saxena, Rohit, et al.
Published: (2026)
by: Saxena, Rohit, et al.
Published: (2026)
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
by: Chen, Pingyi, et al.
Published: (2025)
by: Chen, Pingyi, et al.
Published: (2025)
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
by: Zhang, Jianke, et al.
Published: (2026)
by: Zhang, Jianke, et al.
Published: (2026)
Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation
by: Zhao, Ran, et al.
Published: (2026)
by: Zhao, Ran, et al.
Published: (2026)
Eyes on VLM: Benchmarking Gaze Following and Social Gaze Prediction in Vision Language Models
by: Wang, Hengfei, et al.
Published: (2026)
by: Wang, Hengfei, et al.
Published: (2026)
VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation
by: Ye, Jianlin, et al.
Published: (2025)
by: Ye, Jianlin, et al.
Published: (2025)
AnyTrans: Translate AnyText in the Image with Large Scale Models
by: Qian, Zhipeng, et al.
Published: (2024)
by: Qian, Zhipeng, et al.
Published: (2024)
Lite Any Stereo: Efficient Zero-Shot Stereo Matching
by: Jing, Junpeng, et al.
Published: (2025)
by: Jing, Junpeng, et al.
Published: (2025)
AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement
by: Hu, Zhaofeng, et al.
Published: (2026)
by: Hu, Zhaofeng, et al.
Published: (2026)
AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models
by: Heo, Hyeongjun, et al.
Published: (2026)
by: Heo, Hyeongjun, et al.
Published: (2026)
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
by: Xu, Zhenlin, et al.
Published: (2023)
by: Xu, Zhenlin, et al.
Published: (2023)
DVLA-RL: Dual-Level Vision-Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
by: Li, Wenhao, et al.
Published: (2026)
by: Li, Wenhao, et al.
Published: (2026)
TransPrune: Token Transition Pruning for Efficient Large Vision-Language Model
by: Li, Ao, et al.
Published: (2025)
by: Li, Ao, et al.
Published: (2025)
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness
by: Guo, Ce, et al.
Published: (2025)
by: Guo, Ce, et al.
Published: (2025)
PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models
by: Li, Yuliang, et al.
Published: (2026)
by: Li, Yuliang, et al.
Published: (2026)
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
by: Hyeon-Woo, Nam, et al.
Published: (2024)
by: Hyeon-Woo, Nam, et al.
Published: (2024)
VLM-UQBench: A Benchmark for Modality-Specific and Cross-Modality Uncertainties in Vision Language Models
by: Wang, Chenyu, et al.
Published: (2026)
by: Wang, Chenyu, et al.
Published: (2026)
SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence
by: Zeng, Zhitao, et al.
Published: (2025)
by: Zeng, Zhitao, et al.
Published: (2025)
ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding
by: Shi, Liang, et al.
Published: (2024)
by: Shi, Liang, et al.
Published: (2024)
FastVLM: Efficient Vision Encoding for Vision Language Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
by: Li, Wenhao, et al.
Published: (2025)
by: Li, Wenhao, et al.
Published: (2025)
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
by: Zhu, William Yicheng, et al.
Published: (2024)
by: Zhu, William Yicheng, et al.
Published: (2024)
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Knowledge Distillation for Underwater Feature Extraction and Matching via GAN-synthesized Images
by: Yang, Jinghe, et al.
Published: (2025)
by: Yang, Jinghe, et al.
Published: (2025)
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
by: Huang, Zhipeng, et al.
Published: (2024)
by: Huang, Zhipeng, et al.
Published: (2024)
H2VLR: Heterogeneous Hypergraph Vision-Language Reasoning for Few-Shot Anomaly Detection
by: Huang, Jianghong, et al.
Published: (2026)
by: Huang, Jianghong, et al.
Published: (2026)
Similar Items
-
Generate Your Talking Avatar from Video Reference
by: Guo, Zujin, et al.
Published: (2026) -
Navigation with VLM framework: Towards Going to Any Language
by: Yin, Zecheng, et al.
Published: (2024) -
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
by: Guo, Yuliang, et al.
Published: (2025) -
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning
by: Zhang, Ce, et al.
Published: (2024) -
TinyVLM: Zero-Shot Object Detection on Microcontrollers via Vision-Language Distillation with Matryoshka Embeddings
by: Wilson, Bibin
Published: (2026)