:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lan, Guanzhou, Liao, Chenyi, Yang, Yuqi, Ma, Qianli, Wang, Zhigang, Wang, Dong, Zhao, Bin, Li, Xuelong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.04565
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Night-to-Day Translation via Illumination Degradation Disentanglement
by: Lan, Guanzhou, et al.
Published: (2024)

Efficient Diffusion as Low Light Enhancer
by: Lan, Guanzhou, et al.
Published: (2024)

Open-Vocabulary Octree-Graph for 3D Scene Understanding
by: Wang, Zhigang, et al.
Published: (2024)

Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
by: Zhang, Pingrui, et al.
Published: (2025)

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
by: Zhang, Da, et al.
Published: (2025)

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
by: Yan, Chi, et al.
Published: (2023)

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy
by: Wu, Pengyuan, et al.
Published: (2026)

Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning
by: Gao, Xianqiang, et al.
Published: (2026)

RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
by: Gao, Bingjie, et al.
Published: (2025)

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
by: Tang, Yiwen, et al.
Published: (2023)

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)

Beyond Retraining: Training-Free Unknown Class Filtering for Source-Free Open Set Domain Adaptation of Vision-Language Models
by: Li, Yongguang, et al.
Published: (2025)

LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning
by: Yuan, Jiang, et al.
Published: (2025)

Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)

SpatialBot: Precise Spatial Understanding with Vision Language Models
by: Cai, Wenxiao, et al.
Published: (2024)

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
by: Jing, Linglin, et al.
Published: (2024)

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
by: Yang, Fan, et al.
Published: (2026)

Degradation-Aware Image Enhancement via Vision-Language Classification
by: Cai, Jie, et al.
Published: (2025)

GLAD: Generalizable Tuning for Vision-Language Models
by: Peng, Yuqi, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
by: Zhu, Xingyu, et al.
Published: (2026)

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025)

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
by: Qu, Delin, et al.
Published: (2024)

Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)

Vehicle Perception from Satellite
by: Zhao, Bin, et al.
Published: (2024)

Enhanced Continual Learning of Vision-Language Models with Model Fusion
by: Gao, Haoyuan, et al.
Published: (2025)

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
by: Zhang, Junjie, et al.
Published: (2024)

Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
by: Yao, Yuanqi, et al.
Published: (2025)

Transferable 3D Adversarial Shape Completion using Diffusion Models
by: Dai, Xuelong, et al.
Published: (2024)

EVLM: An Efficient Vision-Language Model for Visual Understanding
by: Chen, Kaibing, et al.
Published: (2024)

Enhance Vision-Language Alignment with Noise
by: Huang, Sida, et al.
Published: (2024)

AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
by: Li, Yuqi, et al.
Published: (2025)

Improving Transferable Targeted Attacks with Feature Tuning Mixup
by: Liang, Kaisheng, et al.
Published: (2024)

MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)

Gradient-Free Adversarial Purification with Diffusion Models
by: Dai, Xuelong, et al.
Published: (2025)

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
by: Tang, Yiwen, et al.
Published: (2024)

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)

Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
by: Ma, Jingtian, et al.
Published: (2025)

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
by: Li, Yueyan, et al.
Published: (2025)

Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function
by: Zhuang, Chenyi, et al.
Published: (2024)