Saved in:
| Main Authors: | Chiliński, Mateusz, Ołtusek, Julita, Jaśkowski, Wojciech |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.16470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
by: Borchmann, Łukasz, et al.
Published: (2024)
by: Borchmann, Łukasz, et al.
Published: (2024)
Mano Technical Report
by: Fu, Tianyu, et al.
Published: (2025)
by: Fu, Tianyu, et al.
Published: (2025)
Qwen2.5-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)
by: Bai, Shuai, et al.
Published: (2025)
VARCO-VISION-2.0 Technical Report
by: Cha, Young-rok, et al.
Published: (2025)
by: Cha, Young-rok, et al.
Published: (2025)
Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024)
by: Malartic, Quentin, et al.
Published: (2024)
Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)
by: Auer, Christoph, et al.
Published: (2024)
Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
Privacy-Aware Camera 2.0 Technical Report
by: Song, Huan, et al.
Published: (2026)
by: Song, Huan, et al.
Published: (2026)
MedGemma Technical Report
by: Sellergren, Andrew, et al.
Published: (2025)
by: Sellergren, Andrew, et al.
Published: (2025)
Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)
by: Li, Yadong, et al.
Published: (2024)
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
by: Cesista, Franz Louis
Published: (2024)
by: Cesista, Franz Louis
Published: (2024)
Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
by: Ma, Guoqing, et al.
Published: (2025)
by: Ma, Guoqing, et al.
Published: (2025)
MiMo-Embodied: X-Embodied Foundation Model Technical Report
by: Hao, Xiaoshuai, et al.
Published: (2025)
by: Hao, Xiaoshuai, et al.
Published: (2025)
Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)
by: Phoenix, Team, et al.
Published: (2026)
Pegasus-v1 Technical Report
by: Jung, Raehyuk, et al.
Published: (2024)
by: Jung, Raehyuk, et al.
Published: (2024)
Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
by: Liang, Hao, et al.
Published: (2025)
by: Liang, Hao, et al.
Published: (2025)
PhysBrain 1.0 Technical Report
by: Lian, Shijie, et al.
Published: (2026)
by: Lian, Shijie, et al.
Published: (2026)
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)
by: Huang, Haoyang, et al.
Published: (2025)
Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)
by: Lu, Shiyin, et al.
Published: (2025)
Qwen2.5-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)
by: Xu, Jin, et al.
Published: (2025)
UI-Venus-1.5 Technical Report
by: Venus Team, et al.
Published: (2026)
by: Venus Team, et al.
Published: (2026)
Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization
by: Wang, YiFeng, et al.
Published: (2026)
by: Wang, YiFeng, et al.
Published: (2026)
Qwen3-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)
by: Xu, Jin, et al.
Published: (2025)
Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025)
by: He, Yuxuan, et al.
Published: (2025)
RADAR: Relative Angular Divergence Across Representations
by: Cadet, Xavier, et al.
Published: (2026)
by: Cadet, Xavier, et al.
Published: (2026)
HiLight: Technical Report on the Motern AI Video Language Model
by: Wang, Zhiting, et al.
Published: (2024)
by: Wang, Zhiting, et al.
Published: (2024)
H2OVL-Mississippi Vision Language Models Technical Report
by: Galib, Shaikat, et al.
Published: (2024)
by: Galib, Shaikat, et al.
Published: (2024)
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
by: Lab, Shanghai AI, et al.
Published: (2025)
by: Lab, Shanghai AI, et al.
Published: (2025)
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025)
by: Wang, Haoming, et al.
Published: (2025)
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
by: Tas, Omer Sahin, et al.
Published: (2024)
by: Tas, Omer Sahin, et al.
Published: (2024)
ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation
by: Hou, Wenjun, et al.
Published: (2024)
by: Hou, Wenjun, et al.
Published: (2024)
MAIRA-2: Grounded Radiology Report Generation
by: Bannur, Shruthi, et al.
Published: (2024)
by: Bannur, Shruthi, et al.
Published: (2024)
Image Generation Models: A Technical History
by: Shirvani, Rouzbeh
Published: (2026)
by: Shirvani, Rouzbeh
Published: (2026)
TechING: Towards Real World Technical Image Understanding via VLMs
by: Nadeem, Tafazzul, et al.
Published: (2026)
by: Nadeem, Tafazzul, et al.
Published: (2026)
RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection
by: Hou, Wenjun, et al.
Published: (2025)
by: Hou, Wenjun, et al.
Published: (2025)
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
by: Jin, Haibo, et al.
Published: (2023)
by: Jin, Haibo, et al.
Published: (2023)
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
by: Turski, Michał, et al.
Published: (2025)
by: Turski, Michał, et al.
Published: (2025)
On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI
by: Restrepo, David, et al.
Published: (2025)
by: Restrepo, David, et al.
Published: (2025)
Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation
by: Tian, Yuanhe, et al.
Published: (2025)
by: Tian, Yuanhe, et al.
Published: (2025)
Similar Items
-
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
by: Borchmann, Łukasz, et al.
Published: (2024) -
Mano Technical Report
by: Fu, Tianyu, et al.
Published: (2025) -
Qwen2.5-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025) -
VARCO-VISION-2.0 Technical Report
by: Cha, Young-rok, et al.
Published: (2025) -
Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024)