:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lahmi, Jules, Roger, Alexis
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.10336
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models
by: Downer, Gabriel, et al.
Published: (2025)

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)

Aligning VLM Assistants with Personalized Situated Cognition
by: Li, Yongqi, et al.
Published: (2025)

English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
by: Dhaliwal, Mehak, et al.
Published: (2026)

Autonomous Frontier-Based Exploration with VLM Guidance
by: Aitha, Aarush, et al.
Published: (2026)

PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
by: Lokesh, K, et al.
Published: (2026)

DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
by: Wang, Kangrui, et al.
Published: (2025)

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning
by: Schoepp, Sheila, et al.
Published: (2025)

OSPC: Artificial VLM Features for Hateful Meme Detection
by: Grönquist, Peter
Published: (2024)

Navigation with VLM framework: Towards Going to Any Language
by: Yin, Zecheng, et al.
Published: (2024)

Hybrid Decision Making via Conformal VLM-generated Guidance
by: Banerjee, Debodeep, et al.
Published: (2026)

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
by: Wu, Qinzhuo, et al.
Published: (2024)

Multilingual Training and Evaluation Resources for Vision-Language Models
by: Baiamonte, Daniela, et al.
Published: (2026)

Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
by: Hao, Yilun, et al.
Published: (2025)

StreamingVLM: Real-Time Understanding for Infinite Video Streams
by: Xu, Ruyi, et al.
Published: (2025)

Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
by: Huang, Yihong, et al.
Published: (2026)

BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)

Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents
by: Xu, Weikai, et al.
Published: (2025)

Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
by: Omasa, Takamitsu, et al.
Published: (2025)

GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning
by: Ebouky, Brown, et al.
Published: (2026)

Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
by: Yu, Seonghoon, et al.
Published: (2026)

Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
by: Singh, Ayush, et al.
Published: (2024)

PaliGemma: A versatile 3B VLM for transfer
by: Beyer, Lucas, et al.
Published: (2024)

VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
by: Atuhurra, Jesse, et al.
Published: (2025)

The Roles of English in Evaluating Multilingual Language Models
by: Poelman, Wessel, et al.
Published: (2024)

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
by: Zhang, Danqing, et al.
Published: (2025)

SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
by: Kapadnis, Manav Nitin, et al.
Published: (2024)

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference
by: Lin, Fangzhou, et al.
Published: (2026)

Gaperon: A Peppered English-French Generative Language Model Suite
by: Godey, Nathan, et al.
Published: (2025)

Classification of Human- and AI-Generated Texts for English, French, German, and Spanish
by: Schaaff, Kristina, et al.
Published: (2023)

AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
by: Suglia, Alessandro, et al.
Published: (2024)

CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025)

Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters
by: Gurgurov, Daniil, et al.
Published: (2024)

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
by: Xing, Sen, et al.
Published: (2024)

Toward In-Context Teaching: Adapting Examples to Students' Misconceptions
by: Ross, Alexis, et al.
Published: (2024)

Methodology of Adapting Large English Language Models for Specific Cultural Contexts
by: Zhang, Wenjing, et al.
Published: (2024)

BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
by: Zhan, Qiusi, et al.
Published: (2025)

CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries
by: Liu, Shudong, et al.
Published: (2025)

SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
by: Ji, Yicheng, et al.
Published: (2025)