Saved in:
| Main Authors: | Qian, Chen, Yu, Xinran, Huang, Zewen, Li, Danyang, Ma, Qiang, Dang, Fan, Ding, Xuan, Shang, Guangyong, Yang, Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.12638 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass
by: Qian, Chen, et al.
Published: (2026)
by: Qian, Chen, et al.
Published: (2026)
OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion
by: Qian, Chen, et al.
Published: (2025)
by: Qian, Chen, et al.
Published: (2025)
OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping
by: Li, Danyang, et al.
Published: (2025)
by: Li, Danyang, et al.
Published: (2025)
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
by: Yu, Lei, et al.
Published: (2025)
by: Yu, Lei, et al.
Published: (2025)
VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing
by: Pan, Guanyuan, et al.
Published: (2026)
by: Pan, Guanyuan, et al.
Published: (2026)
Swim2Real: VLM-Guided System Identification for Sim-to-Real Transfer
by: Qiu, Kevin, et al.
Published: (2026)
by: Qiu, Kevin, et al.
Published: (2026)
Critical edge statistics for deformed GinUEs
by: Liu, Dang-Zheng, et al.
Published: (2023)
by: Liu, Dang-Zheng, et al.
Published: (2023)
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
by: Wu, Chengyue, et al.
Published: (2026)
by: Wu, Chengyue, et al.
Published: (2026)
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving
by: Fu, Yongjie, et al.
Published: (2024)
by: Fu, Yongjie, et al.
Published: (2024)
Physics-Guided VLM Priors for All-Cloud Removal
by: Xu, Liying, et al.
Published: (2026)
by: Xu, Liying, et al.
Published: (2026)
DocVLM: Make Your VLM an Efficient Reader
by: Nacson, Mor Shpigel, et al.
Published: (2024)
by: Nacson, Mor Shpigel, et al.
Published: (2024)
VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving
by: Liu, Haichao, et al.
Published: (2025)
by: Liu, Haichao, et al.
Published: (2025)
Efficient Cloud-edge Collaborative Approaches to SPARQL Queries over Large RDF graphs
by: Ma, Shidan, et al.
Published: (2026)
by: Ma, Shidan, et al.
Published: (2026)
Small-Large Collaboration: Training-efficient Concept Personalization for Large VLM using a Meta Personalized Small VLM
by: Yang, Sihan, et al.
Published: (2025)
by: Yang, Sihan, et al.
Published: (2025)
VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm
by: Wu, Zhenkai, et al.
Published: (2025)
by: Wu, Zhenkai, et al.
Published: (2025)
CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems
by: Liu, Haichao, et al.
Published: (2025)
by: Liu, Haichao, et al.
Published: (2025)
Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments
by: Zhu, Yuhan, et al.
Published: (2024)
by: Zhu, Yuhan, et al.
Published: (2024)
VLM6D: VLM based 6Dof Pose Estimation based on RGB-D Images
by: Sarowar, Md Selim, et al.
Published: (2025)
by: Sarowar, Md Selim, et al.
Published: (2025)
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
by: Xu, Runsen, et al.
Published: (2024)
by: Xu, Runsen, et al.
Published: (2024)
Multilingual VLM Training: Adapting an English-Trained VLM to French
by: Lahmi, Jules, et al.
Published: (2025)
by: Lahmi, Jules, et al.
Published: (2025)
TRANSPORTER: Transferring Visual Semantics from VLM Manifolds
by: Stergiou, Alexandros
Published: (2025)
by: Stergiou, Alexandros
Published: (2025)
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment
by: Yao, Yao, et al.
Published: (2024)
by: Yao, Yao, et al.
Published: (2024)
Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation
by: Qi, Zheng, et al.
Published: (2025)
by: Qi, Zheng, et al.
Published: (2025)
Rethinking Intermediate Representation for VLM-based Robot Manipulation
by: Tang, Weiliang, et al.
Published: (2025)
by: Tang, Weiliang, et al.
Published: (2025)
VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation
by: Chen, Jiaming, et al.
Published: (2025)
by: Chen, Jiaming, et al.
Published: (2025)
EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM
by: Ao, Shuang, et al.
Published: (2025)
by: Ao, Shuang, et al.
Published: (2025)
VLM-TDP: VLM-guided Trajectory-conditioned Diffusion Policy for Robust Long-Horizon Manipulation
by: Huang, Kefeng, et al.
Published: (2025)
by: Huang, Kefeng, et al.
Published: (2025)
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
by: Huang, Yihong, et al.
Published: (2026)
by: Huang, Yihong, et al.
Published: (2026)
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024)
by: Xue, Xizhe, et al.
Published: (2024)
EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models
by: Seo, Minjae, et al.
Published: (2025)
by: Seo, Minjae, et al.
Published: (2025)
LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
by: Xie, Roy, et al.
Published: (2026)
by: Xie, Roy, et al.
Published: (2026)
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
by: Park, ChaeHun, et al.
Published: (2024)
by: Park, ChaeHun, et al.
Published: (2024)
MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents
by: Zhang, Ruoxuan, et al.
Published: (2025)
by: Zhang, Ruoxuan, et al.
Published: (2025)
SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM
by: Sato, Makoto, et al.
Published: (2026)
by: Sato, Makoto, et al.
Published: (2026)
VLM-driven Behavior Tree for Context-aware Task Planning
by: Wake, Naoki, et al.
Published: (2025)
by: Wake, Naoki, et al.
Published: (2025)
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition
by: Zhang, Zaiwei, et al.
Published: (2024)
by: Zhang, Zaiwei, et al.
Published: (2024)
MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM
by: Chen, Tao, et al.
Published: (2025)
by: Chen, Tao, et al.
Published: (2025)
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
by: Zhang, Di, et al.
Published: (2024)
by: Zhang, Di, et al.
Published: (2024)
HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction
by: Khan, Muhammad Haris, et al.
Published: (2025)
by: Khan, Muhammad Haris, et al.
Published: (2025)
Similar Items
-
SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass
by: Qian, Chen, et al.
Published: (2026) -
OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion
by: Qian, Chen, et al.
Published: (2025) -
OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping
by: Li, Danyang, et al.
Published: (2025) -
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
by: Yu, Lei, et al.
Published: (2025) -
VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing
by: Pan, Guanyuan, et al.
Published: (2026)