:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiong, Huimin, Meng, Zijie, Hu, Tianxiang, Zhou, Chenyi, Feng, Yang, Liu, Zuozhu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.16781
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice
by: Meng, Zijie, et al.
Published: (2025)

Detecting Dental Landmarks from Intraoral 3D Scans: the 3DTeethLand challenge
by: Ben-Hamadou, Achraf, et al.
Published: (2025)

KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)

Teeth3DS+: An Extended Benchmark for Intraoral 3D Scans Analysis
by: Ben-Hamadou, Achraf, et al.
Published: (2022)

3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
by: Gai, Xiaotang, et al.
Published: (2025)

MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
by: Gai, Xiaotang, et al.
Published: (2024)

Silhouette-to-Contour Registration: Aligning Intraoral Scan Models with Cephalometric Radiographs
by: Miao, Yiyi, et al.
Published: (2025)

Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
by: Jiang, Songtao, et al.
Published: (2025)

DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis
by: Tang, Kun, et al.
Published: (2026)

Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs
by: Miao, Yiyi, et al.
Published: (2025)

Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function
by: Zhuang, Chenyi, et al.
Published: (2024)

Quantized Prompt for Efficient Generalization of Vision-Language Models
by: Hao, Tianxiang, et al.
Published: (2024)

Modest-Align: Data-Efficient Alignment for Vision-Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)

Fair-MoE: Fairness-Oriented Mixture of Experts in Vision-Language Models
by: Wang, Peiran, et al.
Published: (2025)

Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation
by: Weekley, Daron, et al.
Published: (2025)

Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
by: Jiang, Songtao, et al.
Published: (2024)

HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)

Understanding Degradation with Vision Language Model
by: Lan, Guanzhou, et al.
Published: (2026)

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
by: Zhan, Yang, et al.
Published: (2024)

Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model
by: Shi, Yiming, et al.
Published: (2024)

Advancing Lung Disease Diagnosis in 3D CT Scans
by: Li, Qingqiu, et al.
Published: (2025)

Med-GLIP: Advancing Medical Language-Image Pre-training with Large-scale Grounded Dataset
by: Deng, Ziye, et al.
Published: (2025)

HICT: High-precision 3D CBCT reconstruction from a single X-ray
by: Ma, Wen, et al.
Published: (2026)

High-Fidelity 3D Tooth Reconstruction by Fusing Intraoral Scans and CBCT Data via a Deep Implicit Representation
by: Zhu, Yi, et al.
Published: (2026)

Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis
by: Lai, Haoran, et al.
Published: (2026)

PX2Tooth: Reconstructing the 3D Point Cloud Teeth from a Single Panoramic X-ray
by: Ma, Wen, et al.
Published: (2024)

LT-Gaussian: Long-Term Map Update Using 3D Gaussian Splatting for Autonomous Driving
by: Cheng, Luqi, et al.
Published: (2025)

Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
by: Zuo, Zhi, et al.
Published: (2025)

Delving into Out-of-Distribution Detection with Medical Vision-Language Models
by: Ju, Lie, et al.
Published: (2025)

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
by: Jin, Yang, et al.
Published: (2023)

Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering
by: Chen, Yixiong, et al.
Published: (2025)

GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)

GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving
by: Qi, Zhangshuo, et al.
Published: (2024)

A Unified Perspective on Adversarial Membership Manipulation in Vision Models
by: Gao, Ruize, et al.
Published: (2026)

ARM3D: Attention-based relation module for indoor 3D object detection
by: Lan, Yuqing, et al.
Published: (2022)

Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
by: Zhou, Zijie, et al.
Published: (2026)

Unified Personalized Reward Model for Vision Generation
by: Wang, Yibin, et al.
Published: (2026)

Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
by: Jiang, Songtao, et al.
Published: (2024)

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
by: Luo, Yuxuan, et al.
Published: (2025)

From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
by: Wang, Xilin, et al.
Published: (2024)