Saved in:
| Main Authors: | Peng, Jiankun, Guo, Jianyuan, Xu, Ying, Liu, Yue, Yan, Jiashuang, Ye, Xuanwei, Li, Houhua, Wang, Xiaoming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21751 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation
by: Peng, Jiankun, et al.
Published: (2026)
by: Peng, Jiankun, et al.
Published: (2026)
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)
by: Gao, Jianzhe, et al.
Published: (2026)
TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation
by: Liu, Jiaxing, et al.
Published: (2026)
by: Liu, Jiaxing, et al.
Published: (2026)
Semantic Granularity Navigation in Image Editing
by: Lu, Liangsi, et al.
Published: (2026)
by: Lu, Liangsi, et al.
Published: (2026)
AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments
by: Chen, Kehan, et al.
Published: (2024)
by: Chen, Kehan, et al.
Published: (2024)
World-Consistent Data Generation for Vision-and-Language Navigation
by: Zhong, Yu, et al.
Published: (2024)
by: Zhong, Yu, et al.
Published: (2024)
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation
by: Guo, Jianyuan, et al.
Published: (2025)
by: Guo, Jianyuan, et al.
Published: (2025)
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
by: Hao, Zhiwei, et al.
Published: (2024)
by: Hao, Zhiwei, et al.
Published: (2024)
PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention
by: Mei, Hefei, et al.
Published: (2026)
by: Mei, Hefei, et al.
Published: (2026)
GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting
by: Peng, Yuning, et al.
Published: (2024)
by: Peng, Yuning, et al.
Published: (2024)
Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents
by: Ma, Tianyi, et al.
Published: (2025)
by: Ma, Tianyi, et al.
Published: (2025)
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
by: Hou, Xinyu, et al.
Published: (2024)
by: Hou, Xinyu, et al.
Published: (2024)
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)
by: Wang, Liuyi, et al.
Published: (2023)
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
by: Guo, Xiao, et al.
Published: (2025)
by: Guo, Xiao, et al.
Published: (2025)
DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation
by: Liu, Ting, et al.
Published: (2023)
by: Liu, Ting, et al.
Published: (2023)
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
by: Li, Heng, et al.
Published: (2024)
by: Li, Heng, et al.
Published: (2024)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments
by: An, Dong, et al.
Published: (2023)
by: An, Dong, et al.
Published: (2023)
LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation
by: Ning, Yuwei, et al.
Published: (2026)
by: Ning, Yuwei, et al.
Published: (2026)
Dynamic Granularity Matters: Rethinking Vision Transformers Beyond Fixed Patch Splitting
by: Yu, Qiyang, et al.
Published: (2025)
by: Yu, Qiyang, et al.
Published: (2025)
Granular Privacy Control for Geolocation with Vision Language Models
by: Mendes, Ethan, et al.
Published: (2024)
by: Mendes, Ethan, et al.
Published: (2024)
Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models
by: Guo, Boyang, et al.
Published: (2026)
by: Guo, Boyang, et al.
Published: (2026)
Topology-Aware Layer Pruning for Large Vision-Language Models
by: Zheng, Pengcheng, et al.
Published: (2026)
by: Zheng, Pengcheng, et al.
Published: (2026)
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
by: Jia, Ding, et al.
Published: (2024)
by: Jia, Ding, et al.
Published: (2024)
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
by: Deng, Peihua, et al.
Published: (2024)
by: Deng, Peihua, et al.
Published: (2024)
PanoGen++: Domain-Adapted Text-Guided Panoramic Environment Generation for Vision-and-Language Navigation
by: Wang, Sen, et al.
Published: (2025)
by: Wang, Sen, et al.
Published: (2025)
Structured Observation Language for Efficient and Generalizable Vision-Language Navigation
by: Peng, Daojie, et al.
Published: (2026)
by: Peng, Daojie, et al.
Published: (2026)
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
by: Xu, Ming, et al.
Published: (2024)
by: Xu, Ming, et al.
Published: (2024)
Breaking the Rigid Prior: Towards Articulated 3D Anomaly Detection
by: Gan, Jinye, et al.
Published: (2026)
by: Gan, Jinye, et al.
Published: (2026)
GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation
by: Yang, Jiahao, et al.
Published: (2026)
by: Yang, Jiahao, et al.
Published: (2026)
Breaking the Encoder Barrier for Seamless Video-Language Understanding
by: Li, Handong, et al.
Published: (2025)
by: Li, Handong, et al.
Published: (2025)
Data-efficient Large Vision Models through Sequential Autoregression
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction
by: Luo, Hui, et al.
Published: (2024)
by: Luo, Hui, et al.
Published: (2024)
FDiff-Fusion:Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation
by: Ding, Weiping, et al.
Published: (2024)
by: Ding, Weiping, et al.
Published: (2024)
From Static to Dynamic: a Survey of Topology-Aware Perception in Autonomous Driving
by: Chen, Yixiao, et al.
Published: (2025)
by: Chen, Yixiao, et al.
Published: (2025)
\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation
by: Zhu, Weiye, et al.
Published: (2026)
by: Zhu, Weiye, et al.
Published: (2026)
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts
by: Liu, Xu, et al.
Published: (2024)
by: Liu, Xu, et al.
Published: (2024)
GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
by: Yin, Hang, et al.
Published: (2025)
by: Yin, Hang, et al.
Published: (2025)
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
by: Jie, Shibo, et al.
Published: (2024)
by: Jie, Shibo, et al.
Published: (2024)
Similar Items
-
LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation
by: Peng, Jiankun, et al.
Published: (2026) -
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026) -
TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation
by: Liu, Jiaxing, et al.
Published: (2026) -
Semantic Granularity Navigation in Image Editing
by: Lu, Liangsi, et al.
Published: (2026) -
AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation
by: Guo, Wenxuan, et al.
Published: (2026)