Saved in:
| Main Authors: | Lu, Sheng, Chen, Hao, Yin, Rui, Ba, Juyan, Zhang, Yu, Li, Yuanzhe |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.19516 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
by: Zhou, Fengtao, et al.
Published: (2024)
by: Zhou, Fengtao, et al.
Published: (2024)
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
by: Qin, Zhenyue, et al.
Published: (2024)
by: Qin, Zhenyue, et al.
Published: (2024)
A Comparative Analysis of Image Descriptors for Histopathological Classification of Gastric Cancer
by: Usai, Marco, et al.
Published: (2025)
by: Usai, Marco, et al.
Published: (2025)
SteelDefectX: A Multi-Form Vision-Language Dataset and Benchmark for Steel Surface Defect Analysis
by: Zhao, Shuxian, et al.
Published: (2026)
by: Zhao, Shuxian, et al.
Published: (2026)
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
by: Shao, Hao, et al.
Published: (2024)
by: Shao, Hao, et al.
Published: (2024)
LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology
by: Qin, Zhenyue, et al.
Published: (2025)
by: Qin, Zhenyue, et al.
Published: (2025)
HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation
by: Liang, Jiaming, et al.
Published: (2025)
by: Liang, Jiaming, et al.
Published: (2025)
Soft-Label Anonymous Gastric X-ray Image Distillation
by: Li, Guang, et al.
Published: (2021)
by: Li, Guang, et al.
Published: (2021)
ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis
by: Chen, Jian, et al.
Published: (2024)
by: Chen, Jian, et al.
Published: (2024)
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
by: Meng, Fanqing, et al.
Published: (2024)
by: Meng, Fanqing, et al.
Published: (2024)
Multimodal Dataset Distillation via Phased Teacher Models
by: Guo, Shengbin, et al.
Published: (2026)
by: Guo, Shengbin, et al.
Published: (2026)
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
by: Hao, Jing, et al.
Published: (2025)
by: Hao, Jing, et al.
Published: (2025)
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
by: Kerdegari, Hamideh, et al.
Published: (2024)
by: Kerdegari, Hamideh, et al.
Published: (2024)
Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis
by: Weng, Xingxing, et al.
Published: (2026)
by: Weng, Xingxing, et al.
Published: (2026)
Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)
by: Zhang, Jingyi, et al.
Published: (2023)
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
by: Liu, Ting, et al.
Published: (2024)
by: Liu, Ting, et al.
Published: (2024)
Practical X-ray Gastric Cancer Diagnostic Support Using Refined Stochastic Data Augmentation and Hard Boundary Box Training
by: Okamoto, Hideaki, et al.
Published: (2021)
by: Okamoto, Hideaki, et al.
Published: (2021)
LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models
by: Huang, Guolei, et al.
Published: (2025)
by: Huang, Guolei, et al.
Published: (2025)
MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models
by: Li, Jiale, et al.
Published: (2025)
by: Li, Jiale, et al.
Published: (2025)
Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond
by: Zhang, Fan, et al.
Published: (2025)
by: Zhang, Fan, et al.
Published: (2025)
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
by: Yu, Youngjoon, et al.
Published: (2024)
by: Yu, Youngjoon, et al.
Published: (2024)
Motion-Guided Dual-Camera Tracker for Endoscope Tracking and Motion Analysis in a Mechanical Gastric Simulator
by: Zhang, Yuelin, et al.
Published: (2024)
by: Zhang, Yuelin, et al.
Published: (2024)
Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
Jagle: Building a Large-Scale Japanese Multimodal Post-Training Dataset for Vision-Language Models
by: Sugiura, Issa, et al.
Published: (2026)
by: Sugiura, Issa, et al.
Published: (2026)
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
by: Ying, Kaining, et al.
Published: (2024)
by: Ying, Kaining, et al.
Published: (2024)
Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
by: Nguyen, Huu Tien, et al.
Published: (2025)
by: Nguyen, Huu Tien, et al.
Published: (2025)
Assessment of Multimodal Large Language Models in Alignment with Human Values
by: Shi, Zhelun, et al.
Published: (2024)
by: Shi, Zhelun, et al.
Published: (2024)
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models
by: Khanal, Bidur, et al.
Published: (2025)
by: Khanal, Bidur, et al.
Published: (2025)
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
by: Li, Haodong, et al.
Published: (2024)
by: Li, Haodong, et al.
Published: (2024)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
by: Ma, Yingzi, et al.
Published: (2024)
by: Ma, Yingzi, et al.
Published: (2024)
Can Vision Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective
by: An, Arctanx, et al.
Published: (2026)
by: An, Arctanx, et al.
Published: (2026)
KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models
by: Newbury, Rhys, et al.
Published: (2025)
by: Newbury, Rhys, et al.
Published: (2025)
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
by: Li, Chenxu, et al.
Published: (2025)
by: Li, Chenxu, et al.
Published: (2025)
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
by: Du, Hao, et al.
Published: (2025)
by: Du, Hao, et al.
Published: (2025)
Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network
by: Li, Yuanzhe, et al.
Published: (2025)
by: Li, Yuanzhe, et al.
Published: (2025)
A Touch, Vision, and Language Dataset for Multimodal Alignment
by: Fu, Letian, et al.
Published: (2024)
by: Fu, Letian, et al.
Published: (2024)
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models
by: Xu, Zhaopan, et al.
Published: (2025)
by: Xu, Zhaopan, et al.
Published: (2025)
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
by: Wu, Jiannan, et al.
Published: (2024)
by: Wu, Jiannan, et al.
Published: (2024)
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
by: Ren, Xiyu, et al.
Published: (2026)
by: Ren, Xiyu, et al.
Published: (2026)
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
by: Zhang, Yuting, et al.
Published: (2025)
by: Zhang, Yuting, et al.
Published: (2025)
Similar Items
-
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
by: Zhou, Fengtao, et al.
Published: (2024) -
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
by: Qin, Zhenyue, et al.
Published: (2024) -
A Comparative Analysis of Image Descriptors for Histopathological Classification of Gastric Cancer
by: Usai, Marco, et al.
Published: (2025) -
SteelDefectX: A Multi-Form Vision-Language Dataset and Benchmark for Steel Surface Defect Analysis
by: Zhao, Shuxian, et al.
Published: (2026) -
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
by: Shao, Hao, et al.
Published: (2024)