:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chiliński, Mateusz, Ołtusek, Julita, Jaśkowski, Wojciech
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.16470
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Arctic-TILT. Business Document Understanding at Sub-Billion Scale
by: Borchmann, Łukasz, et al.
Published: (2024)

Mano Technical Report
by: Fu, Tianyu, et al.
Published: (2025)

Qwen2.5-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)

VARCO-VISION-2.0 Technical Report
by: Cha, Young-rok, et al.
Published: (2025)

Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024)

Docling Technical Report
by: Auer, Christoph, et al.
Published: (2024)

Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025)

Privacy-Aware Camera 2.0 Technical Report
by: Song, Huan, et al.
Published: (2026)

MedGemma Technical Report
by: Sellergren, Andrew, et al.
Published: (2025)

Baichuan-Omni Technical Report
by: Li, Yadong, et al.
Published: (2024)

Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report
by: Cesista, Franz Louis
Published: (2024)

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent
by: Chen, Wei, et al.
Published: (2024)

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
by: Ma, Guoqing, et al.
Published: (2025)

MiMo-Embodied: X-Embodied Foundation Model Technical Report
by: Hao, Xiaoshuai, et al.
Published: (2025)

Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)

Pegasus-v1 Technical Report
by: Jung, Raehyuk, et al.
Published: (2024)

Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge
by: Liang, Hao, et al.
Published: (2025)

PhysBrain 1.0 Technical Report
by: Lian, Shijie, et al.
Published: (2026)

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
by: Huang, Haoyang, et al.
Published: (2025)

Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)

Qwen2.5-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)

UI-Venus-1.5 Technical Report
by: Venus Team, et al.
Published: (2026)

Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization
by: Wang, YiFeng, et al.
Published: (2026)

Qwen3-Omni Technical Report
by: Xu, Jin, et al.
Published: (2025)

Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025)

RADAR: Relative Angular Divergence Across Representations
by: Cadet, Xavier, et al.
Published: (2026)

HiLight: Technical Report on the Motern AI Video Language Model
by: Wang, Zhiting, et al.
Published: (2024)

H2OVL-Mississippi Vision Language Models Technical Report
by: Galib, Shaikat, et al.
Published: (2024)

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
by: Lab, Shanghai AI, et al.
Published: (2025)

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025)

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
by: Tas, Omer Sahin, et al.
Published: (2024)

ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation
by: Hou, Wenjun, et al.
Published: (2024)

MAIRA-2: Grounded Radiology Report Generation
by: Bannur, Shruthi, et al.
Published: (2024)

Image Generation Models: A Technical History
by: Shirvani, Rouzbeh
Published: (2026)

TechING: Towards Real World Technical Image Understanding via VLMs
by: Nadeem, Tafazzul, et al.
Published: (2026)

RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection
by: Hou, Wenjun, et al.
Published: (2025)

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
by: Jin, Haibo, et al.
Published: (2023)

Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
by: Turski, Michał, et al.
Published: (2025)

On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI
by: Restrepo, David, et al.
Published: (2025)

Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation
by: Tian, Yuanhe, et al.
Published: (2025)