Saved in:
| Main Authors: | Yu, Yongqiang, Li, Xuhui, Mahmood, Hazza, Zhou, Jinxing, Hong, Haodong, Jiang, Longtao, Xu, Zhiqiang, Wu, Qi, Chang, Xiaojun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10322 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AgriChain Visually Grounded Expert Verified Reasoning for Interpretable Agricultural Vision Language Models
by: Mahmood, Hazza, et al.
Published: (2026)
by: Mahmood, Hazza, et al.
Published: (2026)
General Scene Adaptation for Vision-and-Language Navigation
by: Hong, Haodong, et al.
Published: (2025)
by: Hong, Haodong, et al.
Published: (2025)
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments
by: Li, Zerui, et al.
Published: (2025)
by: Li, Zerui, et al.
Published: (2025)
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
by: Hong, Haodong, et al.
Published: (2024)
by: Hong, Haodong, et al.
Published: (2024)
Path-Guided Flow Matching for Dataset Distillation
by: Li, Xuhui, et al.
Published: (2026)
by: Li, Xuhui, et al.
Published: (2026)
Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos
by: Chen, Haodong, et al.
Published: (2026)
by: Chen, Haodong, et al.
Published: (2026)
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection
by: Han, Mingfei, et al.
Published: (2025)
by: Han, Mingfei, et al.
Published: (2025)
Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments
by: Hong, Haodong, et al.
Published: (2024)
by: Hong, Haodong, et al.
Published: (2024)
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
by: Zhou, Gengze, et al.
Published: (2024)
by: Zhou, Gengze, et al.
Published: (2024)
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
by: Zhou, Jinxing, et al.
Published: (2024)
by: Zhou, Jinxing, et al.
Published: (2024)
Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation
by: Zhou, Jinxing, et al.
Published: (2025)
by: Zhou, Jinxing, et al.
Published: (2025)
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
by: Lin, Bingqian, et al.
Published: (2024)
by: Lin, Bingqian, et al.
Published: (2024)
Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective
by: Wang, Qiaosi, et al.
Published: (2025)
by: Wang, Qiaosi, et al.
Published: (2025)
LatentPilot: Scene-Aware Vision-and-Language Navigation by Dreaming Ahead with Latent Visual Reasoning
by: Hao, Haihong, et al.
Published: (2026)
by: Hao, Haihong, et al.
Published: (2026)
Vision-Language Navigation with Continual Learning
by: Li, Zhiyuan, et al.
Published: (2024)
by: Li, Zhiyuan, et al.
Published: (2024)
A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments
by: Rasch, Julian, et al.
Published: (2025)
by: Rasch, Julian, et al.
Published: (2025)
GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
by: Li, Xuhui, et al.
Published: (2025)
by: Li, Xuhui, et al.
Published: (2025)
CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval
by: Li, Haozhou, et al.
Published: (2026)
by: Li, Haozhou, et al.
Published: (2026)
Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models
by: Jiang, Longtao, et al.
Published: (2025)
by: Jiang, Longtao, et al.
Published: (2025)
Vision-Language Model Selection and Reuse for Downstream Adaptation
by: Tan, Hao-Zhe, et al.
Published: (2025)
by: Tan, Hao-Zhe, et al.
Published: (2025)
Landmark-Guided Knowledge for Vision-and-Language Navigation
by: Yang, Dongsheng, et al.
Published: (2025)
by: Yang, Dongsheng, et al.
Published: (2025)
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
by: Wang, Xudong, et al.
Published: (2026)
by: Wang, Xudong, et al.
Published: (2026)
Test-Time Adaptation for Tactile-Vision-Language Models
by: Ye, Chuyang, et al.
Published: (2026)
by: Ye, Chuyang, et al.
Published: (2026)
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
by: Yang, Cehao, et al.
Published: (2025)
by: Yang, Cehao, et al.
Published: (2025)
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
by: Yang, Panqi, et al.
Published: (2025)
by: Yang, Panqi, et al.
Published: (2025)
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
by: Giannone, Giorgio, et al.
Published: (2025)
by: Giannone, Giorgio, et al.
Published: (2025)
GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses
by: Mun, Jimin, et al.
Published: (2026)
by: Mun, Jimin, et al.
Published: (2026)
Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
by: Li, Jiajie, et al.
Published: (2026)
by: Li, Jiajie, et al.
Published: (2026)
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
by: Ye, Junjie, et al.
Published: (2025)
by: Ye, Junjie, et al.
Published: (2025)
IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments
by: Liu, Xu, et al.
Published: (2025)
by: Liu, Xu, et al.
Published: (2025)
MapDream: Task-Driven Map Learning for Vision-Language Navigation
by: Lian, Guoxin, et al.
Published: (2026)
by: Lian, Guoxin, et al.
Published: (2026)
Routing, Cascades, and User Choice for LLMs
by: Mahmood, Rafid
Published: (2026)
by: Mahmood, Rafid
Published: (2026)
What Limits Vision-and-Language Navigation ?
by: Wang, Yunheng, et al.
Published: (2026)
by: Wang, Yunheng, et al.
Published: (2026)
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
by: Lee, Sungjae, et al.
Published: (2025)
by: Lee, Sungjae, et al.
Published: (2025)
Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval
by: Ma, Shengjie, et al.
Published: (2024)
by: Ma, Shengjie, et al.
Published: (2024)
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
by: Li, Heng, et al.
Published: (2024)
by: Li, Heng, et al.
Published: (2024)
SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
Evolving Prompt Adaptation for Vision-Language Models
by: Zhang, Enming, et al.
Published: (2026)
by: Zhang, Enming, et al.
Published: (2026)
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
by: Zhou, Gengze, et al.
Published: (2024)
by: Zhou, Gengze, et al.
Published: (2024)
Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation
by: Wang, Zehao, et al.
Published: (2024)
by: Wang, Zehao, et al.
Published: (2024)
Similar Items
-
AgriChain Visually Grounded Expert Verified Reasoning for Interpretable Agricultural Vision Language Models
by: Mahmood, Hazza, et al.
Published: (2026) -
General Scene Adaptation for Vision-and-Language Navigation
by: Hong, Haodong, et al.
Published: (2025) -
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments
by: Li, Zerui, et al.
Published: (2025) -
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
by: Hong, Haodong, et al.
Published: (2024) -
Path-Guided Flow Matching for Dataset Distillation
by: Li, Xuhui, et al.
Published: (2026)