Saved in:
| Main Authors: | Lahmi, Jules, Roger, Alexis |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10336 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models
by: Downer, Gabriel, et al.
Published: (2025)
by: Downer, Gabriel, et al.
Published: (2025)
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)
by: Jiang, Ziyan, et al.
Published: (2024)
Aligning VLM Assistants with Personalized Situated Cognition
by: Li, Yongqi, et al.
Published: (2025)
by: Li, Yongqi, et al.
Published: (2025)
English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
by: Dhaliwal, Mehak, et al.
Published: (2026)
by: Dhaliwal, Mehak, et al.
Published: (2026)
Autonomous Frontier-Based Exploration with VLM Guidance
by: Aitha, Aarush, et al.
Published: (2026)
by: Aitha, Aarush, et al.
Published: (2026)
PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis
by: Lokesh, K, et al.
Published: (2026)
by: Lokesh, K, et al.
Published: (2026)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
by: Wang, Kangrui, et al.
Published: (2025)
by: Wang, Kangrui, et al.
Published: (2025)
The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning
by: Schoepp, Sheila, et al.
Published: (2025)
by: Schoepp, Sheila, et al.
Published: (2025)
OSPC: Artificial VLM Features for Hateful Meme Detection
by: Grönquist, Peter
Published: (2024)
by: Grönquist, Peter
Published: (2024)
Navigation with VLM framework: Towards Going to Any Language
by: Yin, Zecheng, et al.
Published: (2024)
by: Yin, Zecheng, et al.
Published: (2024)
Hybrid Decision Making via Conformal VLM-generated Guidance
by: Banerjee, Debodeep, et al.
Published: (2026)
by: Banerjee, Debodeep, et al.
Published: (2026)
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
by: Wu, Qinzhuo, et al.
Published: (2024)
by: Wu, Qinzhuo, et al.
Published: (2024)
Multilingual Training and Evaluation Resources for Vision-Language Models
by: Baiamonte, Daniela, et al.
Published: (2026)
by: Baiamonte, Daniela, et al.
Published: (2026)
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
by: Hao, Yilun, et al.
Published: (2025)
by: Hao, Yilun, et al.
Published: (2025)
StreamingVLM: Real-Time Understanding for Infinite Video Streams
by: Xu, Ruyi, et al.
Published: (2025)
by: Xu, Ruyi, et al.
Published: (2025)
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
by: Huang, Yihong, et al.
Published: (2026)
by: Huang, Yihong, et al.
Published: (2026)
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)
by: Wang, Shengao, et al.
Published: (2025)
Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents
by: Xu, Weikai, et al.
Published: (2025)
by: Xu, Weikai, et al.
Published: (2025)
Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
by: Omasa, Takamitsu, et al.
Published: (2025)
by: Omasa, Takamitsu, et al.
Published: (2025)
GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning
by: Ebouky, Brown, et al.
Published: (2026)
by: Ebouky, Brown, et al.
Published: (2026)
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation
by: Yu, Seonghoon, et al.
Published: (2026)
by: Yu, Seonghoon, et al.
Published: (2026)
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
by: Singh, Ayush, et al.
Published: (2024)
by: Singh, Ayush, et al.
Published: (2024)
PaliGemma: A versatile 3B VLM for transfer
by: Beyer, Lucas, et al.
Published: (2024)
by: Beyer, Lucas, et al.
Published: (2024)
VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
by: Atuhurra, Jesse, et al.
Published: (2025)
by: Atuhurra, Jesse, et al.
Published: (2025)
The Roles of English in Evaluating Multilingual Language Models
by: Poelman, Wessel, et al.
Published: (2024)
by: Poelman, Wessel, et al.
Published: (2024)
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
by: Zhang, Danqing, et al.
Published: (2025)
by: Zhang, Danqing, et al.
Published: (2025)
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
by: Kapadnis, Manav Nitin, et al.
Published: (2024)
by: Kapadnis, Manav Nitin, et al.
Published: (2024)
AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference
by: Lin, Fangzhou, et al.
Published: (2026)
by: Lin, Fangzhou, et al.
Published: (2026)
Gaperon: A Peppered English-French Generative Language Model Suite
by: Godey, Nathan, et al.
Published: (2025)
by: Godey, Nathan, et al.
Published: (2025)
Classification of Human- and AI-Generated Texts for English, French, German, and Spanish
by: Schaaff, Kristina, et al.
Published: (2023)
by: Schaaff, Kristina, et al.
Published: (2023)
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding
by: Suglia, Alessandro, et al.
Published: (2024)
by: Suglia, Alessandro, et al.
Published: (2024)
CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025)
by: Carvalho, Miguel, et al.
Published: (2025)
Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters
by: Gurgurov, Daniil, et al.
Published: (2024)
by: Gurgurov, Daniil, et al.
Published: (2024)
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
by: Xing, Sen, et al.
Published: (2024)
by: Xing, Sen, et al.
Published: (2024)
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions
by: Ross, Alexis, et al.
Published: (2024)
by: Ross, Alexis, et al.
Published: (2024)
Methodology of Adapting Large English Language Models for Specific Cultural Contexts
by: Zhang, Wenjing, et al.
Published: (2024)
by: Zhang, Wenjing, et al.
Published: (2024)
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
by: Zhan, Qiusi, et al.
Published: (2025)
by: Zhan, Qiusi, et al.
Published: (2025)
CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries
by: Liu, Shudong, et al.
Published: (2025)
by: Liu, Shudong, et al.
Published: (2025)
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
by: Ji, Yicheng, et al.
Published: (2025)
by: Ji, Yicheng, et al.
Published: (2025)
Similar Items
-
Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models
by: Downer, Gabriel, et al.
Published: (2025) -
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024) -
Aligning VLM Assistants with Personalized Situated Cognition
by: Li, Yongqi, et al.
Published: (2025) -
English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
by: Dhaliwal, Mehak, et al.
Published: (2026) -
Autonomous Frontier-Based Exploration with VLM Guidance
by: Aitha, Aarush, et al.
Published: (2026)