Saved in:
| Main Authors: | Valls, Antoni, Sanchez-Riera, Jordi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11782 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)
by: Gao, Jianzhe, et al.
Published: (2026)
Exploring the Use of VLMs for Navigation Assistance for People with Blindness and Low Vision
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
Vision-Based Risk Aware Emergency Landing for UAVs in Complex Urban Environments
by: de la Torre-Vanegas, Julio, et al.
Published: (2025)
by: de la Torre-Vanegas, Julio, et al.
Published: (2025)
Vision-Based Autonomous UAV Navigation and Landing for Urban Search and Rescue
by: Mittal, Mayank, et al.
Published: (2019)
by: Mittal, Mayank, et al.
Published: (2019)
Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision
by: Zhou, Wentao, et al.
Published: (2025)
by: Zhou, Wentao, et al.
Published: (2025)
Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA
by: Zhu, Xiaorong, et al.
Published: (2026)
by: Zhu, Xiaorong, et al.
Published: (2026)
Embodied Scene Understanding for Vision Language Models via MetaVQA
by: Wang, Weizhen, et al.
Published: (2025)
by: Wang, Weizhen, et al.
Published: (2025)
InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering
by: Canela, Antonio, et al.
Published: (2023)
by: Canela, Antonio, et al.
Published: (2023)
Verification Mirage: Mapping the Reliability Boundary of Self-Verification in Medical VQA
by: Jin, Ruinan, et al.
Published: (2026)
by: Jin, Ruinan, et al.
Published: (2026)
Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark
by: Mushkani, Rashid
Published: (2025)
by: Mushkani, Rashid
Published: (2025)
Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA
by: He, Haibin, et al.
Published: (2026)
by: He, Haibin, et al.
Published: (2026)
BEA-GS: BEyond RAdiance Supervision in 3DGS for Precise Object Extraction
by: Mazzucchelli, Alessio, et al.
Published: (2026)
by: Mazzucchelli, Alessio, et al.
Published: (2026)
Uncertainty-Aware Vision-based Risk Object Identification via Conformal Risk Tube Prediction
by: Fu, Kai-Yu, et al.
Published: (2026)
by: Fu, Kai-Yu, et al.
Published: (2026)
Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming
by: Zhou, Yue, et al.
Published: (2026)
by: Zhou, Yue, et al.
Published: (2026)
LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation
by: Ning, Yuwei, et al.
Published: (2026)
by: Ning, Yuwei, et al.
Published: (2026)
IRIS: Intent Resolution via Inference-time Saccades for Open-Ended VQA in Large Vision-Language Models
by: Madinei, Parsa, et al.
Published: (2026)
by: Madinei, Parsa, et al.
Published: (2026)
Identifying Crucial Objects in Blind and Low-Vision Individuals' Navigation
by: Islam, Md Touhidul, et al.
Published: (2024)
by: Islam, Md Touhidul, et al.
Published: (2024)
UrbanSAM: Learning Invariance-Inspired Adapters for Segment Anything Models in Urban Construction
by: Li, Chenyu, et al.
Published: (2025)
by: Li, Chenyu, et al.
Published: (2025)
CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering
by: Hong, Yuyang, et al.
Published: (2026)
by: Hong, Yuyang, et al.
Published: (2026)
EventFlash: Towards Efficient MLLMs for Event-Based Vision
by: Liu, Shaoyu, et al.
Published: (2026)
by: Liu, Shaoyu, et al.
Published: (2026)
EduVQA: Towards Concept-Aware Assessment of Educational AI-Generated Videos
by: Chen, Baoliang, et al.
Published: (2026)
by: Chen, Baoliang, et al.
Published: (2026)
Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction
by: Zhao, Ganlong, et al.
Published: (2025)
by: Zhao, Ganlong, et al.
Published: (2025)
Dynamic Topology Awareness: Breaking the Granularity Rigidity in Vision-Language Navigation
by: Peng, Jiankun, et al.
Published: (2026)
by: Peng, Jiankun, et al.
Published: (2026)
Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA?
by: Shin, Jongmin, et al.
Published: (2026)
by: Shin, Jongmin, et al.
Published: (2026)
Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage
by: Xie, Junfei, et al.
Published: (2026)
by: Xie, Junfei, et al.
Published: (2026)
Low-Latency Scalable Streaming for Event-Based Vision
by: Hamara, Andrew, et al.
Published: (2024)
by: Hamara, Andrew, et al.
Published: (2024)
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
by: Shahgir, Haz Sameen, et al.
Published: (2024)
by: Shahgir, Haz Sameen, et al.
Published: (2024)
Look to Locate: Vision-Based Multisensory Navigation with 3-D Digital Maps for GNSS-Challenged Environments
by: Elmaghraby, Ola, et al.
Published: (2025)
by: Elmaghraby, Ola, et al.
Published: (2025)
Enhancing Document VQA Models via Retrieval-Augmented Generation
by: López, Eric, et al.
Published: (2025)
by: López, Eric, et al.
Published: (2025)
AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation
by: Guo, Wenxuan, et al.
Published: (2026)
by: Guo, Wenxuan, et al.
Published: (2026)
Improving Medical VQA through Trajectory-Aware Process Supervision
by: Gulluk, Halil Ibrahim, et al.
Published: (2026)
by: Gulluk, Halil Ibrahim, et al.
Published: (2026)
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026)
by: Gao, Jianzhe, et al.
Published: (2026)
MapDream: Task-Driven Map Learning for Vision-Language Navigation
by: Lian, Guoxin, et al.
Published: (2026)
by: Lian, Guoxin, et al.
Published: (2026)
Interpreting Low-level Vision Models with Causal Effect Maps
by: Hu, Jinfan, et al.
Published: (2024)
by: Hu, Jinfan, et al.
Published: (2024)
Vision-Based Localization in Dense Urban Environments: A Case Study of an Urban Village in China
by: Wu, Menglin, et al.
Published: (2026)
by: Wu, Menglin, et al.
Published: (2026)
On the Role of Visual Grounding in VQA
by: Reich, Daniel, et al.
Published: (2024)
by: Reich, Daniel, et al.
Published: (2024)
PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2023)
by: Wang, Liuyi, et al.
Published: (2023)
From Street View to Visual Network: Mapping the Visibility of Urban Landmarks with Vision-Language Models
by: Fan, Zicheng, et al.
Published: (2025)
by: Fan, Zicheng, et al.
Published: (2025)
Similar Items
-
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
by: Gao, Jianzhe, et al.
Published: (2026) -
Exploring the Use of VLMs for Navigation Assistance for People with Blindness and Low Vision
by: Li, Yu, et al.
Published: (2026) -
Vision-Based Risk Aware Emergency Landing for UAVs in Complex Urban Environments
by: de la Torre-Vanegas, Julio, et al.
Published: (2025) -
Vision-Based Autonomous UAV Navigation and Landing for Urban Search and Rescue
by: Mittal, Mayank, et al.
Published: (2019) -
Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision
by: Zhou, Wentao, et al.
Published: (2025)