Saved in:
| Main Authors: | Shapourian, Hassan, Hejazi, Kasra, Sule, Olabode M., Millidge, Beren |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.08560 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ZAYA1-8B Technical Report
by: Washbourne, Robert, et al.
Published: (2026)
by: Washbourne, Robert, et al.
Published: (2026)
Seed1.5-VL Technical Report
by: Guo, Dong, et al.
Published: (2025)
by: Guo, Dong, et al.
Published: (2025)
Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)
by: Bai, Shuai, et al.
Published: (2025)
PLaMo 2.1-VL Technical Report
by: Kerola, Tommi, et al.
Published: (2026)
by: Kerola, Tommi, et al.
Published: (2026)
Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)
by: Phoenix, Team, et al.
Published: (2026)
AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model
by: Jin, Zhiwei, et al.
Published: (2025)
by: Jin, Zhiwei, et al.
Published: (2025)
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
by: Lu, Dongchen, et al.
Published: (2025)
by: Lu, Dongchen, et al.
Published: (2025)
Motif-Video 2B: Technical Report
by: Lim, Junghwan, et al.
Published: (2026)
by: Lim, Junghwan, et al.
Published: (2026)
Ovis-U1 Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)
by: Wang, Guo-Hua, et al.
Published: (2025)
Dolphin v1.0 Technical Report
by: Weng, Taohan, et al.
Published: (2025)
by: Weng, Taohan, et al.
Published: (2025)
Singpath-VL Technical Report
by: Qiu, Zhen, et al.
Published: (2026)
by: Qiu, Zhen, et al.
Published: (2026)
Kimi-VL Technical Report
by: Kimi Team, et al.
Published: (2025)
by: Kimi Team, et al.
Published: (2025)
Phi-4-reasoning-vision-15B Technical Report
by: Aneja, Jyoti, et al.
Published: (2026)
by: Aneja, Jyoti, et al.
Published: (2026)
Ovis-Image Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)
by: Wang, Guo-Hua, et al.
Published: (2025)
HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)
by: Hunyuan Vision Team, et al.
Published: (2025)
Pegasus-v1 Technical Report
by: Jung, Raehyuk, et al.
Published: (2024)
by: Jung, Raehyuk, et al.
Published: (2024)
FoundBioNet: A Foundation-Based Model for IDH Genotyping of Glioma from Multi-Parametric MRI
by: Farahani, Somayeh, et al.
Published: (2025)
by: Farahani, Somayeh, et al.
Published: (2025)
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
by: Dong, Daxiang, et al.
Published: (2025)
by: Dong, Daxiang, et al.
Published: (2025)
A-VL: Adaptive Attention for Large Vision-Language Models
by: Zhang, Junyang, et al.
Published: (2024)
by: Zhang, Junyang, et al.
Published: (2024)
STEP3-VL-10B Technical Report
by: Huang, Ailin, et al.
Published: (2026)
by: Huang, Ailin, et al.
Published: (2026)
Kwai Keye-VL Technical Report
by: Kwai Keye Team, et al.
Published: (2025)
by: Kwai Keye Team, et al.
Published: (2025)
SAIL-VL2 Technical Report
by: Yin, Weijie, et al.
Published: (2025)
by: Yin, Weijie, et al.
Published: (2025)
Technical Report: Competition Solution For Modelscope-Sora
by: Chen, Shengfu, et al.
Published: (2024)
by: Chen, Shengfu, et al.
Published: (2024)
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
by: Wen, Zichen, et al.
Published: (2026)
by: Wen, Zichen, et al.
Published: (2026)
MedGemma Technical Report
by: Sellergren, Andrew, et al.
Published: (2025)
by: Sellergren, Andrew, et al.
Published: (2025)
Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)
by: Li, Yadong, et al.
Published: (2024)
GR-3 Technical Report
by: Cheang, Chilam, et al.
Published: (2025)
by: Cheang, Chilam, et al.
Published: (2025)
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
by: Zhang, Bo, et al.
Published: (2025)
by: Zhang, Bo, et al.
Published: (2025)
Kwai Keye-VL 1.5 Technical Report
by: Yang, Biao, et al.
Published: (2025)
by: Yang, Biao, et al.
Published: (2025)
VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving
by: Xu, Zhefan, et al.
Published: (2026)
by: Xu, Zhefan, et al.
Published: (2026)
SDAR-VL: Stable and Efficient Block-wise Diffusion for Vision-Language Understanding
by: Cheng, Shuang, et al.
Published: (2025)
by: Cheng, Shuang, et al.
Published: (2025)
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
by: He, Yefei, et al.
Published: (2024)
by: He, Yefei, et al.
Published: (2024)
EventVL: Understand Event Streams via Multimodal Large Language Model
by: Li, Pengteng, et al.
Published: (2025)
by: Li, Pengteng, et al.
Published: (2025)
ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
by: Yi, Jingwei, et al.
Published: (2025)
by: Yi, Jingwei, et al.
Published: (2025)
PhysBrain 1.0 Technical Report
by: Lian, Shijie, et al.
Published: (2026)
by: Lian, Shijie, et al.
Published: (2026)
Associative Memories in the Feature Space
by: Salvatori, Tommaso, et al.
Published: (2024)
by: Salvatori, Tommaso, et al.
Published: (2024)
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
by: Varma, Maya, et al.
Published: (2024)
by: Varma, Maya, et al.
Published: (2024)
Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries
by: Pranav, Tushar, et al.
Published: (2025)
by: Pranav, Tushar, et al.
Published: (2025)
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
by: Yuan, Kun, et al.
Published: (2024)
by: Yuan, Kun, et al.
Published: (2024)
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
by: Fhima, Jonathan, et al.
Published: (2024)
by: Fhima, Jonathan, et al.
Published: (2024)
Similar Items
-
ZAYA1-8B Technical Report
by: Washbourne, Robert, et al.
Published: (2026) -
Seed1.5-VL Technical Report
by: Guo, Dong, et al.
Published: (2025) -
Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025) -
PLaMo 2.1-VL Technical Report
by: Kerola, Tommi, et al.
Published: (2026) -
Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)