Saved in:
| Main Authors: | Lan, Guanzhou, Liao, Chenyi, Yang, Yuqi, Ma, Qianli, Wang, Zhigang, Wang, Dong, Zhao, Bin, Li, Xuelong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04565 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Night-to-Day Translation via Illumination Degradation Disentanglement
by: Lan, Guanzhou, et al.
Published: (2024)
by: Lan, Guanzhou, et al.
Published: (2024)
Efficient Diffusion as Low Light Enhancer
by: Lan, Guanzhou, et al.
Published: (2024)
by: Lan, Guanzhou, et al.
Published: (2024)
Open-Vocabulary Octree-Graph for 3D Scene Understanding
by: Wang, Zhigang, et al.
Published: (2024)
by: Wang, Zhigang, et al.
Published: (2024)
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
by: Zhang, Pingrui, et al.
Published: (2025)
by: Zhang, Pingrui, et al.
Published: (2025)
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
by: Zhang, Da, et al.
Published: (2025)
by: Zhang, Da, et al.
Published: (2025)
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
by: Yan, Chi, et al.
Published: (2023)
by: Yan, Chi, et al.
Published: (2023)
Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy
by: Wu, Pengyuan, et al.
Published: (2026)
by: Wu, Pengyuan, et al.
Published: (2026)
Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning
by: Gao, Xianqiang, et al.
Published: (2026)
by: Gao, Xianqiang, et al.
Published: (2026)
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
by: Gao, Bingjie, et al.
Published: (2025)
by: Gao, Bingjie, et al.
Published: (2025)
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
by: Tang, Yiwen, et al.
Published: (2023)
by: Tang, Yiwen, et al.
Published: (2023)
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
by: Yang, Shuai, et al.
Published: (2025)
by: Yang, Shuai, et al.
Published: (2025)
Beyond Retraining: Training-Free Unknown Class Filtering for Source-Free Open Set Domain Adaptation of Vision-Language Models
by: Li, Yongguang, et al.
Published: (2025)
by: Li, Yongguang, et al.
Published: (2025)
LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning
by: Yuan, Jiang, et al.
Published: (2025)
by: Yuan, Jiang, et al.
Published: (2025)
Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)
by: Wang, Fengshun, et al.
Published: (2026)
SpatialBot: Precise Spatial Understanding with Vision Language Models
by: Cai, Wenxiao, et al.
Published: (2024)
by: Cai, Wenxiao, et al.
Published: (2024)
Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
by: Jing, Linglin, et al.
Published: (2024)
by: Jing, Linglin, et al.
Published: (2024)
TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
by: Yang, Fan, et al.
Published: (2026)
by: Yang, Fan, et al.
Published: (2026)
Degradation-Aware Image Enhancement via Vision-Language Classification
by: Cai, Jie, et al.
Published: (2025)
by: Cai, Jie, et al.
Published: (2025)
GLAD: Generalizable Tuning for Vision-Language Models
by: Peng, Yuqi, et al.
Published: (2025)
by: Peng, Yuqi, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
by: Zhu, Xingyu, et al.
Published: (2026)
by: Zhu, Xingyu, et al.
Published: (2026)
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
by: Liu, Junli, et al.
Published: (2025)
by: Liu, Junli, et al.
Published: (2025)
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
by: Qu, Delin, et al.
Published: (2024)
by: Qu, Delin, et al.
Published: (2024)
Unified Vision-Language-Action Model
by: Wang, Yuqi, et al.
Published: (2025)
by: Wang, Yuqi, et al.
Published: (2025)
Vehicle Perception from Satellite
by: Zhao, Bin, et al.
Published: (2024)
by: Zhao, Bin, et al.
Published: (2024)
Enhanced Continual Learning of Vision-Language Models with Model Fusion
by: Gao, Haoyuan, et al.
Published: (2025)
by: Gao, Haoyuan, et al.
Published: (2025)
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
by: Zhang, Junjie, et al.
Published: (2024)
by: Zhang, Junjie, et al.
Published: (2024)
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
by: Yao, Yuanqi, et al.
Published: (2025)
by: Yao, Yuanqi, et al.
Published: (2025)
Transferable 3D Adversarial Shape Completion using Diffusion Models
by: Dai, Xuelong, et al.
Published: (2024)
by: Dai, Xuelong, et al.
Published: (2024)
EVLM: An Efficient Vision-Language Model for Visual Understanding
by: Chen, Kaibing, et al.
Published: (2024)
by: Chen, Kaibing, et al.
Published: (2024)
Enhance Vision-Language Alignment with Noise
by: Huang, Sida, et al.
Published: (2024)
by: Huang, Sida, et al.
Published: (2024)
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
by: Li, Yuqi, et al.
Published: (2025)
by: Li, Yuqi, et al.
Published: (2025)
Improving Transferable Targeted Attacks with Feature Tuning Mixup
by: Liang, Kaisheng, et al.
Published: (2024)
by: Liang, Kaisheng, et al.
Published: (2024)
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
by: Zhang, Pingrui, et al.
Published: (2025)
by: Zhang, Pingrui, et al.
Published: (2025)
Gradient-Free Adversarial Purification with Diffusion Models
by: Dai, Xuelong, et al.
Published: (2025)
by: Dai, Xuelong, et al.
Published: (2025)
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
by: Tang, Yiwen, et al.
Published: (2024)
by: Tang, Yiwen, et al.
Published: (2024)
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
by: Ma, Jingtian, et al.
Published: (2025)
by: Ma, Jingtian, et al.
Published: (2025)
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
by: Li, Yueyan, et al.
Published: (2025)
by: Li, Yueyan, et al.
Published: (2025)
Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function
by: Zhuang, Chenyi, et al.
Published: (2024)
by: Zhuang, Chenyi, et al.
Published: (2024)
Similar Items
-
Night-to-Day Translation via Illumination Degradation Disentanglement
by: Lan, Guanzhou, et al.
Published: (2024) -
Efficient Diffusion as Low Light Enhancer
by: Lan, Guanzhou, et al.
Published: (2024) -
Open-Vocabulary Octree-Graph for 3D Scene Understanding
by: Wang, Zhigang, et al.
Published: (2024) -
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
by: Zhang, Pingrui, et al.
Published: (2025) -
UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
by: Zhang, Da, et al.
Published: (2025)