:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pate, Seth, Wong, Lawson L. S.
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.03900
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VISTAv2: World Imagination for Indoor Vision-and-Language Navigation
by: Huang, Yanjia, et al.
Published: (2025)

Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
by: Dinkar, Tanvi, et al.
Published: (2025)

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
by: Raistrick, Alexander, et al.
Published: (2024)

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
by: Su, Xia, et al.
Published: (2026)

VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
by: Dominguez-Dager, Bessie, et al.
Published: (2026)

Vision-Based Localization and LLM-based Navigation for Indoor Environments
by: Rahimi, Keyan, et al.
Published: (2025)

Optimizing Vision-Language Interactions Through Decoder-Only Models
by: Tanaka, Kaito, et al.
Published: (2024)

Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models
by: Seth, Ashish, et al.
Published: (2024)

PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors
by: Jin, Xirui, et al.
Published: (2025)

Locality Alignment Improves Vision-Language Models
by: Covert, Ian, et al.
Published: (2024)

Mechanisms of Object Localization in Vision-Language Models
by: Schaumlöffel, Timothy, et al.
Published: (2026)

Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos
by: Xu, Haoxuan, et al.
Published: (2026)

Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction
by: Huang, Binxiao, et al.
Published: (2025)

Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
by: Qi, Yu, et al.
Published: (2025)

MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives
by: Ikezogwo, Wisdom O., et al.
Published: (2025)

Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)

Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models
by: Imran, Muhammad, et al.
Published: (2025)

Multimodal Indoor Localization Using Crowdsourced Radio Maps
by: Yi, Zhaoguang, et al.
Published: (2023)

ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
by: Guo, Yangyang, et al.
Published: (2023)

Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
by: Lim, Geuntaek, et al.
Published: (2024)

Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
by: Chen, Shiming, et al.
Published: (2025)

HalLoc: Token-level Localization of Hallucinations for Vision Language Models
by: Park, Eunkyu, et al.
Published: (2025)

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025)

Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
by: Li, Zhuoxiao, et al.
Published: (2023)

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
by: Zhao, Shihao, et al.
Published: (2024)

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping
by: Lazarow, Justin, et al.
Published: (2025)

SpatialLM: Training Large Language Models for Structured Indoor Modeling
by: Mao, Yongsen, et al.
Published: (2025)

Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models
by: Nakayama, Aya, et al.
Published: (2025)

Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
by: Wong, Bryan, et al.
Published: (2025)

3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments
by: Ress, Vincent, et al.
Published: (2025)

LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)

Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)

GalLoP: Learning Global and Local Prompts for Vision-Language Models
by: Lafon, Marc, et al.
Published: (2024)

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
by: Li, Jiaqi, et al.
Published: (2026)

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
by: Zhang, Mingfang, et al.
Published: (2025)

VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
by: Kang, Shuhao, et al.
Published: (2026)

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
by: Zhou, Zewei, et al.
Published: (2025)

MarineEval: Assessing the Marine Intelligence of Vision-Language Models
by: Wong, YuK-Kwan, et al.
Published: (2025)

SceneNAT: Masked Generative Modeling for Language-Guided Indoor Scene Synthesis
by: Choi, Jeongjun, et al.
Published: (2026)

Local Statistics for Generative Image Detection
by: Wong, Yung Jer, et al.
Published: (2023)