:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bourigault, Emmanuelle, Bourigault, Pauline
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2508.04469
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View
by: Bourigault, Emmanuelle, et al.
Published: (2024)

World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models
by: Bourigault, Emmanuelle
Published: (2026)

UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
by: Bourigault, Emmanuelle, et al.
Published: (2025)

X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
by: Bourigault, Emmanuelle, et al.
Published: (2024)

3D Spine Shape Estimation from Single 2D DXA
by: Bourigault, Emmanuelle, et al.
Published: (2024)

Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning
by: Wei, Xinyu, et al.
Published: (2025)

PhenoLIP: Integrating Phenotype Ontology Knowledge into Medical Vision-Language Pretraining
by: Liang, Cheng, et al.
Published: (2026)

Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models
by: Liu, Peiju, et al.
Published: (2026)

On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
by: Kim, Geewook, et al.
Published: (2024)

TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models
by: Ye, Jinlun, et al.
Published: (2026)

Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies
by: Hayashi, Kazuki, et al.
Published: (2025)

Lost in Embeddings: Information Loss in Vision-Language Models
by: Li, Wenyan, et al.
Published: (2025)

Vision-Language Models Do Not Understand Negation
by: Alhamoud, Kumail, et al.
Published: (2025)

Vision-and-Language Navigation Generative Pretrained Transformer
by: Hanlin, Wen
Published: (2024)

Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
by: Cho, Yeongjae, et al.
Published: (2024)

Do Vision-Language Models Really Understand Visual Language?
by: Hou, Yifan, et al.
Published: (2024)

No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models
by: Sun, Min Woo, et al.
Published: (2025)

Do Vision-Language Models Understand Visual Persuasiveness?
by: Park, Gyuwon
Published: (2025)

Do Vision-Language Models Understand Compound Nouns?
by: Kumar, Sonal, et al.
Published: (2024)

Understanding Museum Exhibits using Vision-Language Reasoning
by: Balauca, Ada-Astrid, et al.
Published: (2024)

Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images
by: Mahanta, Cristina, et al.
Published: (2025)

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning
by: Bourigault, Pauline, et al.
Published: (2026)

Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)

PUMGPT: A Large Vision-Language Model for Product Understanding
by: Xue, Wei, et al.
Published: (2023)

Inference-Time Structural Reasoning for Compositional Vision-Language Understanding
by: Bhattacharya, Amartya
Published: (2026)

Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)

Renaissance: Investigating the Pretraining of Vision-Language Encoders
by: Fields, Clayton, et al.
Published: (2024)

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
by: Zhang, Wenqi, et al.
Published: (2025)

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
by: Luo, Yaxin, et al.
Published: (2026)

EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
by: Ma, Mingjie, et al.
Published: (2024)

Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models
by: Son, Jaemin, et al.
Published: (2025)

Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)

VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding
by: Yu, Haorui, et al.
Published: (2026)

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)

VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models
by: Kumar, Gokul Karthik, et al.
Published: (2025)

Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models
by: Horawalavithana, Sameera, et al.
Published: (2026)

Benchmarking Vision Language Models for Cultural Understanding
by: Nayak, Shravan, et al.
Published: (2024)

HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
by: Wei, Yanbin, et al.
Published: (2026)

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning
by: Zhu, Yingjie, et al.
Published: (2024)