:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kerola, Tommi, Masuda, Yuya, Masuko, Takashi, Nakanishi, Toshiki, Nishino, Daisuke, Takahashi, Kuniyuki, Wang, Hanqin, Yamada, Yoshihiro
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.19324
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PLaMo 2 Technical Report
by: Networks, Preferred, et al.
Published: (2025)

Xiaomi MiMo-VL-Miloco Technical Report
by: Li, Jiaze, et al.
Published: (2025)

Singpath-VL Technical Report
by: Qiu, Zhen, et al.
Published: (2026)

Kimi-VL Technical Report
by: Kimi Team, et al.
Published: (2025)

Kwai Keye-VL Technical Report
by: Kwai Keye Team, et al.
Published: (2025)

SAIL-VL2 Technical Report
by: Yin, Weijie, et al.
Published: (2025)

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
by: Elements, Preferred, et al.
Published: (2024)

Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)

STEP3-VL-10B Technical Report
by: Huang, Ailin, et al.
Published: (2026)

Kwai Keye-VL 1.5 Technical Report
by: Yang, Biao, et al.
Published: (2025)

SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects
by: Ummadisingu, Avinash, et al.
Published: (2024)

Qwen2.5-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)

Seed1.5-VL Technical Report
by: Guo, Dong, et al.
Published: (2025)

ZAYA1-VL-8B Technical Report
by: Shapourian, Hassan, et al.
Published: (2026)

Phoenix-VL 1.5 Medium Technical Report
by: Phoenix, Team, et al.
Published: (2026)

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model
by: Jin, Zhiwei, et al.
Published: (2025)

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
by: Yamada, Yoshihiro
Published: (2025)

Person-In-Situ: Scene-Consistent Human Image Insertion with Occlusion-Aware Pose Control
by: Masuda, Shun, et al.
Published: (2025)

TerraFusion: Joint Generation of Terrain Geometry and Texture Using Latent Diffusion Models
by: Higo, Kazuki, et al.
Published: (2025)

J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM
by: Yoshida, Takero, et al.
Published: (2024)

MiMo-Embodied: X-Embodied Foundation Model Technical Report
by: Hao, Xiaoshuai, et al.
Published: (2025)

Quantifying Cancer Likeness: A Statistical Approach for Pathological Image Diagnosis
by: Kindo, Toshiki
Published: (2024)

Kelix Technical Report
by: Ding, Boyang, et al.
Published: (2026)

VEN-VL: A Visual Ensemble MoE Framework for Effective and Efficient Multi-Modal Understanding
by: Wu, Yinghao, et al.
Published: (2026)

MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks
by: Tian, Haijiang, et al.
Published: (2024)

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
by: Uchida, Kengo, et al.
Published: (2024)

Detection of trade in products derived from threatened species using machine learning and a smartphone
by: Kulkarni, Ritwik, et al.
Published: (2025)

StreamingClaw Technical Report
by: Chen, Jiawei, et al.
Published: (2026)

Uni-Parser Technical Report
by: Fang, Xi, et al.
Published: (2025)

Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)

Logics-Parsing Technical Report
by: Chen, Xiangyang, et al.
Published: (2025)

ABot-OCR Technical Report
by: Jiang, Kaitao, et al.
Published: (2026)

NeuroClaw Technical Report
by: Wang, Cheng, et al.
Published: (2026)

Qwen-Image Technical Report
by: Wu, Chenfei, et al.
Published: (2025)

Kling-Omni Technical Report
by: Kling Team, et al.
Published: (2025)

InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
by: Lu, Dongchen, et al.
Published: (2025)

HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery
by: Matsubara, Yuto, et al.
Published: (2024)

Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
by: Enyo, Yuto, et al.
Published: (2023)

ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model
by: Chu, Xuangeng, et al.
Published: (2025)

Time-varying rPPG signal separation via block-sparse signal model
by: Kurihara, Kosuke, et al.
Published: (2026)