Saved in:
| Main Authors: | Bourigault, Emmanuelle, Bourigault, Pauline |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.04469 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View
by: Bourigault, Emmanuelle, et al.
Published: (2024)
by: Bourigault, Emmanuelle, et al.
Published: (2024)
World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models
by: Bourigault, Emmanuelle
Published: (2026)
by: Bourigault, Emmanuelle
Published: (2026)
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
by: Bourigault, Emmanuelle, et al.
Published: (2025)
by: Bourigault, Emmanuelle, et al.
Published: (2025)
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
by: Bourigault, Emmanuelle, et al.
Published: (2024)
by: Bourigault, Emmanuelle, et al.
Published: (2024)
3D Spine Shape Estimation from Single 2D DXA
by: Bourigault, Emmanuelle, et al.
Published: (2024)
by: Bourigault, Emmanuelle, et al.
Published: (2024)
Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning
by: Wei, Xinyu, et al.
Published: (2025)
by: Wei, Xinyu, et al.
Published: (2025)
PhenoLIP: Integrating Phenotype Ontology Knowledge into Medical Vision-Language Pretraining
by: Liang, Cheng, et al.
Published: (2026)
by: Liang, Cheng, et al.
Published: (2026)
Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models
by: Liu, Peiju, et al.
Published: (2026)
by: Liu, Peiju, et al.
Published: (2026)
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
by: Kim, Geewook, et al.
Published: (2024)
by: Kim, Geewook, et al.
Published: (2024)
TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models
by: Ye, Jinlun, et al.
Published: (2026)
by: Ye, Jinlun, et al.
Published: (2026)
Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies
by: Hayashi, Kazuki, et al.
Published: (2025)
by: Hayashi, Kazuki, et al.
Published: (2025)
Lost in Embeddings: Information Loss in Vision-Language Models
by: Li, Wenyan, et al.
Published: (2025)
by: Li, Wenyan, et al.
Published: (2025)
Vision-Language Models Do Not Understand Negation
by: Alhamoud, Kumail, et al.
Published: (2025)
by: Alhamoud, Kumail, et al.
Published: (2025)
Vision-and-Language Navigation Generative Pretrained Transformer
by: Hanlin, Wen
Published: (2024)
by: Hanlin, Wen
Published: (2024)
Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
by: Cho, Yeongjae, et al.
Published: (2024)
by: Cho, Yeongjae, et al.
Published: (2024)
Do Vision-Language Models Really Understand Visual Language?
by: Hou, Yifan, et al.
Published: (2024)
by: Hou, Yifan, et al.
Published: (2024)
No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models
by: Sun, Min Woo, et al.
Published: (2025)
by: Sun, Min Woo, et al.
Published: (2025)
Do Vision-Language Models Understand Visual Persuasiveness?
by: Park, Gyuwon
Published: (2025)
by: Park, Gyuwon
Published: (2025)
Do Vision-Language Models Understand Compound Nouns?
by: Kumar, Sonal, et al.
Published: (2024)
by: Kumar, Sonal, et al.
Published: (2024)
Understanding Museum Exhibits using Vision-Language Reasoning
by: Balauca, Ada-Astrid, et al.
Published: (2024)
by: Balauca, Ada-Astrid, et al.
Published: (2024)
Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images
by: Mahanta, Cristina, et al.
Published: (2025)
by: Mahanta, Cristina, et al.
Published: (2025)
Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning
by: Bourigault, Pauline, et al.
Published: (2026)
by: Bourigault, Pauline, et al.
Published: (2026)
Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
PUMGPT: A Large Vision-Language Model for Product Understanding
by: Xue, Wei, et al.
Published: (2023)
by: Xue, Wei, et al.
Published: (2023)
Inference-Time Structural Reasoning for Compositional Vision-Language Understanding
by: Bhattacharya, Amartya
Published: (2026)
by: Bhattacharya, Amartya
Published: (2026)
Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)
by: Lee, Jungbeom, et al.
Published: (2024)
Renaissance: Investigating the Pretraining of Vision-Language Encoders
by: Fields, Clayton, et al.
Published: (2024)
by: Fields, Clayton, et al.
Published: (2024)
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)
by: Agrawal, Aakriti, et al.
Published: (2025)
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
by: Zhang, Wenqi, et al.
Published: (2025)
by: Zhang, Wenqi, et al.
Published: (2025)
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
by: Luo, Yaxin, et al.
Published: (2026)
by: Luo, Yaxin, et al.
Published: (2026)
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
by: Ma, Mingjie, et al.
Published: (2024)
by: Ma, Mingjie, et al.
Published: (2024)
Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models
by: Son, Jaemin, et al.
Published: (2025)
by: Son, Jaemin, et al.
Published: (2025)
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)
by: Lavoie, Samuel, et al.
Published: (2024)
VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding
by: Yu, Haorui, et al.
Published: (2026)
by: Yu, Haorui, et al.
Published: (2026)
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models
by: Kumar, Gokul Karthik, et al.
Published: (2025)
by: Kumar, Gokul Karthik, et al.
Published: (2025)
Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models
by: Horawalavithana, Sameera, et al.
Published: (2026)
by: Horawalavithana, Sameera, et al.
Published: (2026)
Benchmarking Vision Language Models for Cultural Understanding
by: Nayak, Shravan, et al.
Published: (2024)
by: Nayak, Shravan, et al.
Published: (2024)
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
by: Wei, Yanbin, et al.
Published: (2026)
by: Wei, Yanbin, et al.
Published: (2026)
Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
by: Zhu, Yingjie, et al.
Published: (2024)
by: Zhu, Yingjie, et al.
Published: (2024)
Similar Items
-
MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View
by: Bourigault, Emmanuelle, et al.
Published: (2024) -
World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models
by: Bourigault, Emmanuelle
Published: (2026) -
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
by: Bourigault, Emmanuelle, et al.
Published: (2025) -
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
by: Bourigault, Emmanuelle, et al.
Published: (2024) -
3D Spine Shape Estimation from Single 2D DXA
by: Bourigault, Emmanuelle, et al.
Published: (2024)