Saved in:
| Main Authors: | Pate, Seth, Wong, Lawson L. S. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.03900 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VISTAv2: World Imagination for Indoor Vision-and-Language Navigation
by: Huang, Yanjia, et al.
Published: (2025)
by: Huang, Yanjia, et al.
Published: (2025)
Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
by: Dinkar, Tanvi, et al.
Published: (2025)
by: Dinkar, Tanvi, et al.
Published: (2025)
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
by: Raistrick, Alexander, et al.
Published: (2024)
by: Raistrick, Alexander, et al.
Published: (2024)
CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
by: Su, Xia, et al.
Published: (2026)
by: Su, Xia, et al.
Published: (2026)
VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
by: Dominguez-Dager, Bessie, et al.
Published: (2026)
by: Dominguez-Dager, Bessie, et al.
Published: (2026)
Vision-Based Localization and LLM-based Navigation for Indoor Environments
by: Rahimi, Keyan, et al.
Published: (2025)
by: Rahimi, Keyan, et al.
Published: (2025)
Optimizing Vision-Language Interactions Through Decoder-Only Models
by: Tanaka, Kaito, et al.
Published: (2024)
by: Tanaka, Kaito, et al.
Published: (2024)
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models
by: Seth, Ashish, et al.
Published: (2024)
by: Seth, Ashish, et al.
Published: (2024)
PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors
by: Jin, Xirui, et al.
Published: (2025)
by: Jin, Xirui, et al.
Published: (2025)
Locality Alignment Improves Vision-Language Models
by: Covert, Ian, et al.
Published: (2024)
by: Covert, Ian, et al.
Published: (2024)
Mechanisms of Object Localization in Vision-Language Models
by: Schaumlöffel, Timothy, et al.
Published: (2026)
by: Schaumlöffel, Timothy, et al.
Published: (2026)
Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos
by: Xu, Haoxuan, et al.
Published: (2026)
by: Xu, Haoxuan, et al.
Published: (2026)
Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction
by: Huang, Binxiao, et al.
Published: (2025)
by: Huang, Binxiao, et al.
Published: (2025)
Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
by: Qi, Yu, et al.
Published: (2025)
by: Qi, Yu, et al.
Published: (2025)
MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives
by: Ikezogwo, Wisdom O., et al.
Published: (2025)
by: Ikezogwo, Wisdom O., et al.
Published: (2025)
Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)
by: Wang, Fengshun, et al.
Published: (2026)
Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models
by: Imran, Muhammad, et al.
Published: (2025)
by: Imran, Muhammad, et al.
Published: (2025)
Multimodal Indoor Localization Using Crowdsourced Radio Maps
by: Yi, Zhaoguang, et al.
Published: (2023)
by: Yi, Zhaoguang, et al.
Published: (2023)
ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
by: Guo, Yangyang, et al.
Published: (2023)
by: Guo, Yangyang, et al.
Published: (2023)
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
by: Lim, Geuntaek, et al.
Published: (2024)
by: Lim, Geuntaek, et al.
Published: (2024)
Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
by: Chen, Shiming, et al.
Published: (2025)
by: Chen, Shiming, et al.
Published: (2025)
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
by: Park, Eunkyu, et al.
Published: (2025)
by: Park, Eunkyu, et al.
Published: (2025)
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
by: Li, Zhuoxiao, et al.
Published: (2023)
by: Li, Zhuoxiao, et al.
Published: (2023)
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
by: Zhao, Shihao, et al.
Published: (2024)
by: Zhao, Shihao, et al.
Published: (2024)
Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping
by: Lazarow, Justin, et al.
Published: (2025)
by: Lazarow, Justin, et al.
Published: (2025)
SpatialLM: Training Large Language Models for Structured Indoor Modeling
by: Mao, Yongsen, et al.
Published: (2025)
by: Mao, Yongsen, et al.
Published: (2025)
Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models
by: Nakayama, Aya, et al.
Published: (2025)
by: Nakayama, Aya, et al.
Published: (2025)
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
by: Wong, Bryan, et al.
Published: (2025)
by: Wong, Bryan, et al.
Published: (2025)
3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments
by: Ress, Vincent, et al.
Published: (2025)
by: Ress, Vincent, et al.
Published: (2025)
LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)
by: Li, Yawei, et al.
Published: (2021)
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
by: Aklilu, Josiah, et al.
Published: (2024)
by: Aklilu, Josiah, et al.
Published: (2024)
GalLoP: Learning Global and Local Prompts for Vision-Language Models
by: Lafon, Marc, et al.
Published: (2024)
by: Lafon, Marc, et al.
Published: (2024)
Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
by: Li, Jiaqi, et al.
Published: (2026)
by: Li, Jiaqi, et al.
Published: (2026)
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
by: Zhang, Mingfang, et al.
Published: (2025)
by: Zhang, Mingfang, et al.
Published: (2025)
VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models
by: Kang, Shuhao, et al.
Published: (2026)
by: Kang, Shuhao, et al.
Published: (2026)
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
by: Zhou, Zewei, et al.
Published: (2025)
by: Zhou, Zewei, et al.
Published: (2025)
MarineEval: Assessing the Marine Intelligence of Vision-Language Models
by: Wong, YuK-Kwan, et al.
Published: (2025)
by: Wong, YuK-Kwan, et al.
Published: (2025)
SceneNAT: Masked Generative Modeling for Language-Guided Indoor Scene Synthesis
by: Choi, Jeongjun, et al.
Published: (2026)
by: Choi, Jeongjun, et al.
Published: (2026)
Local Statistics for Generative Image Detection
by: Wong, Yung Jer, et al.
Published: (2023)
by: Wong, Yung Jer, et al.
Published: (2023)
Similar Items
-
VISTAv2: World Imagination for Indoor Vision-and-Language Navigation
by: Huang, Yanjia, et al.
Published: (2025) -
Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
by: Dinkar, Tanvi, et al.
Published: (2025) -
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
by: Raistrick, Alexander, et al.
Published: (2024) -
CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
by: Su, Xia, et al.
Published: (2026) -
VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
by: Dominguez-Dager, Bessie, et al.
Published: (2026)