Saved in:
| Main Author: | Nortje, Leanne |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.02865 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
by: Nortje, Leanne, et al.
Published: (2024)
by: Nortje, Leanne, et al.
Published: (2024)
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
by: Khorrami, Khazar, et al.
Published: (2021)
by: Khorrami, Khazar, et al.
Published: (2021)
Grounding Language Models for Visual Entity Recognition
by: Xiao, Zilin, et al.
Published: (2024)
by: Xiao, Zilin, et al.
Published: (2024)
Visually Grounded Speech Models have a Mutual Exclusivity Bias
by: Nortje, Leanne, et al.
Published: (2024)
by: Nortje, Leanne, et al.
Published: (2024)
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
by: Luo, Lingxiao, et al.
Published: (2024)
by: Luo, Lingxiao, et al.
Published: (2024)
Vision-Language Modeling in PET/CT for Visual Grounding of Positive Findings
by: Huemann, Zachary, et al.
Published: (2025)
by: Huemann, Zachary, et al.
Published: (2025)
Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction
by: Teng, Matthew Kit Khinn, et al.
Published: (2025)
by: Teng, Matthew Kit Khinn, et al.
Published: (2025)
GroundingGPT:Language Enhanced Multi-modal Grounding Model
by: Li, Zhaowei, et al.
Published: (2024)
by: Li, Zhaowei, et al.
Published: (2024)
The Mechanistic Emergence of Symbol Grounding in Language Models
by: Wu, Shuyu, et al.
Published: (2025)
by: Wu, Shuyu, et al.
Published: (2025)
Bootstrapping Action-Grounded Visual Dynamics in Unified Vision-Language Models
by: Qiu, Yifu, et al.
Published: (2025)
by: Qiu, Yifu, et al.
Published: (2025)
Weakly-Supervised 3D Visual Grounding based on Visual Language Alignment
by: Xu, Xiaoxu, et al.
Published: (2023)
by: Xu, Xiaoxu, et al.
Published: (2023)
NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation
by: Wanchoo, Karan, et al.
Published: (2024)
by: Wanchoo, Karan, et al.
Published: (2024)
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
by: Gimeno-Gómez, David, et al.
Published: (2024)
by: Gimeno-Gómez, David, et al.
Published: (2024)
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
by: Zhao, Haoyu, et al.
Published: (2024)
by: Zhao, Haoyu, et al.
Published: (2024)
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
by: Rajabi, Navid, et al.
Published: (2023)
by: Rajabi, Navid, et al.
Published: (2023)
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
Generalizable Entity Grounding via Assistance of Large Language Model
by: Qi, Lu, et al.
Published: (2024)
by: Qi, Lu, et al.
Published: (2024)
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
by: Chen, Jiaxing, et al.
Published: (2024)
by: Chen, Jiaxing, et al.
Published: (2024)
Show and Guide: Instructional-Plan Grounded Vision and Language Model
by: Glória-Silva, Diogo, et al.
Published: (2024)
by: Glória-Silva, Diogo, et al.
Published: (2024)
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions
by: Li, Jun, et al.
Published: (2025)
by: Li, Jun, et al.
Published: (2025)
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
by: Lu, Jiaying, et al.
Published: (2023)
by: Lu, Jiaying, et al.
Published: (2023)
Visual Representations inside the Language Model
by: Liu, Benlin, et al.
Published: (2025)
by: Liu, Benlin, et al.
Published: (2025)
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
by: Xiao, Han, et al.
Published: (2025)
by: Xiao, Han, et al.
Published: (2025)
Why are Visually-Grounded Language Models Bad at Image Classification?
by: Zhang, Yuhui, et al.
Published: (2024)
by: Zhang, Yuhui, et al.
Published: (2024)
Towards Visual Text Grounding of Multimodal Large Language Model
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding
by: Yoon, Hee Suk, et al.
Published: (2026)
by: Yoon, Hee Suk, et al.
Published: (2026)
Clean Evaluations on Contaminated Visual Language Models
by: Lu, Hongyuan, et al.
Published: (2024)
by: Lu, Hongyuan, et al.
Published: (2024)
Do Vision-Language Models Really Understand Visual Language?
by: Hou, Yifan, et al.
Published: (2024)
by: Hou, Yifan, et al.
Published: (2024)
Visually grounded few-shot word learning in low-resource settings
by: Nortje, Leanne, et al.
Published: (2023)
by: Nortje, Leanne, et al.
Published: (2023)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
by: Wang, Siting, et al.
Published: (2025)
by: Wang, Siting, et al.
Published: (2025)
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
by: Geigle, Gregor, et al.
Published: (2024)
by: Geigle, Gregor, et al.
Published: (2024)
Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models
by: Gröpl, Marcel, et al.
Published: (2026)
by: Gröpl, Marcel, et al.
Published: (2026)
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
by: Kojima, Noriyuki, et al.
Published: (2023)
by: Kojima, Noriyuki, et al.
Published: (2023)
MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models
by: Biswas, Shristi Das, et al.
Published: (2026)
by: Biswas, Shristi Das, et al.
Published: (2026)
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
by: Hu, Yushi, et al.
Published: (2024)
by: Hu, Yushi, et al.
Published: (2024)
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
by: Ma, Chuofan, et al.
Published: (2024)
by: Ma, Chuofan, et al.
Published: (2024)
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
by: Prasad, Archiki, et al.
Published: (2023)
by: Prasad, Archiki, et al.
Published: (2023)
First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models
by: Ha, Jiwoo, et al.
Published: (2026)
by: Ha, Jiwoo, et al.
Published: (2026)
Similar Items
-
Improved Visually Prompted Keyword Localisation in Real Low-Resource Settings
by: Nortje, Leanne, et al.
Published: (2024) -
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
by: Khorrami, Khazar, et al.
Published: (2021) -
Grounding Language Models for Visual Entity Recognition
by: Xiao, Zilin, et al.
Published: (2024) -
Visually Grounded Speech Models have a Mutual Exclusivity Bias
by: Nortje, Leanne, et al.
Published: (2024) -
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
by: Luo, Lingxiao, et al.
Published: (2024)