Saved in:
| Main Authors: | Zhang, Chenghanyu, Li, Zekun, Li, Peipei, Cui, Xing, Xia, Shuhan, Yan, Weixiang, Zhang, Yiqiao, Zhuang, Qianyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.12267 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
by: Zhao, Ming, et al.
Published: (2025)
by: Zhao, Ming, et al.
Published: (2025)
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
by: Liu, Xuannan, et al.
Published: (2024)
by: Liu, Xuannan, et al.
Published: (2024)
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
by: Xia, Shuhan, et al.
Published: (2025)
by: Xia, Shuhan, et al.
Published: (2025)
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025)
by: Liu, Xuannan, et al.
Published: (2025)
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)
by: Zou, Yueying, et al.
Published: (2025)
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
by: Liu, Xuannan, et al.
Published: (2024)
by: Liu, Xuannan, et al.
Published: (2024)
3-Tracer: A Tri-level Temporal-Aware Framework for Audio Forgery Detection and Localization
by: Xia, Shuhan, et al.
Published: (2025)
by: Xia, Shuhan, et al.
Published: (2025)
MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals
by: Shen, Junyu, et al.
Published: (2026)
by: Shen, Junyu, et al.
Published: (2026)
Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
by: Cui, Xing, et al.
Published: (2024)
by: Cui, Xing, et al.
Published: (2024)
Benchmarking PathCLIP for Pathology Image Analysis
by: Zheng, Sunyi, et al.
Published: (2024)
by: Zheng, Sunyi, et al.
Published: (2024)
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
by: Peng, Yuang, et al.
Published: (2024)
by: Peng, Yuang, et al.
Published: (2024)
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
by: Pan, Yaning, et al.
Published: (2025)
by: Pan, Yaning, et al.
Published: (2025)
ID-Cloak: Crafting Identity-Specific Cloaks Against Personalized Text-to-Image Generation
by: Teng, Qianrui, et al.
Published: (2025)
by: Teng, Qianrui, et al.
Published: (2025)
Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion
by: Hansen, Colin, et al.
Published: (2024)
by: Hansen, Colin, et al.
Published: (2024)
EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery
by: Xu, Zelin, et al.
Published: (2026)
by: Xu, Zelin, et al.
Published: (2026)
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
by: Huang, Peizhou, et al.
Published: (2026)
by: Huang, Peizhou, et al.
Published: (2026)
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
by: Zhang, Ziang, et al.
Published: (2025)
by: Zhang, Ziang, et al.
Published: (2025)
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
SpinalSAM-R1: A Vision-Language Multimodal Interactive System for Spine CT Segmentation
by: Liu, Jiaming, et al.
Published: (2025)
by: Liu, Jiaming, et al.
Published: (2025)
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
by: Zhou, Yikang, et al.
Published: (2025)
by: Zhou, Yikang, et al.
Published: (2025)
CFCPalsy: Facial Image Synthesis with Cross-Fusion Cycle Diffusion Model for Facial Paralysis Individuals
by: Gao, Weixiang, et al.
Published: (2024)
by: Gao, Weixiang, et al.
Published: (2024)
11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis
by: Li, Chengzu, et al.
Published: (2025)
by: Li, Chengzu, et al.
Published: (2025)
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
MMDG-Bench: A Benchmark for Multimodal Domain Generalization
by: Zhan, Qianshan, et al.
Published: (2026)
by: Zhan, Qianshan, et al.
Published: (2026)
SpineMamba: Enhancing 3D Spinal Segmentation in Clinical Imaging through Residual Visual Mamba Layers and Shape Priors
by: Zhang, Zhiqing, et al.
Published: (2024)
by: Zhang, Zhiqing, et al.
Published: (2024)
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)
by: Sun, Yuxuan, et al.
Published: (2024)
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
by: Liu, Ziqiang, et al.
Published: (2024)
by: Liu, Ziqiang, et al.
Published: (2024)
Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
by: Wu, Sijing, et al.
Published: (2026)
by: Wu, Sijing, et al.
Published: (2026)
ReactBench: A Cause-Driven Benchmark for Multimodal Hallucination via Systematic Evaluation
by: Zhou, Shizhe, et al.
Published: (2026)
by: Zhou, Shizhe, et al.
Published: (2026)
Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark
by: Hu, Jinpeng, et al.
Published: (2025)
by: Hu, Jinpeng, et al.
Published: (2025)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
by: Xu, Mingjie, et al.
Published: (2025)
by: Xu, Mingjie, et al.
Published: (2025)
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal
by: Nie, Yiqi, et al.
Published: (2026)
by: Nie, Yiqi, et al.
Published: (2026)
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
by: Kil, Jihyung, et al.
Published: (2024)
by: Kil, Jihyung, et al.
Published: (2024)
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
by: Li, Bohao, et al.
Published: (2024)
by: Li, Bohao, et al.
Published: (2024)
BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
by: Wu, Baoyuan, et al.
Published: (2024)
by: Wu, Baoyuan, et al.
Published: (2024)
Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models
by: Shi, Enyi, et al.
Published: (2026)
by: Shi, Enyi, et al.
Published: (2026)
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
by: Ouyang, Kun, et al.
Published: (2024)
by: Ouyang, Kun, et al.
Published: (2024)
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
by: Zhou, Li, et al.
Published: (2025)
by: Zhou, Li, et al.
Published: (2025)
Similar Items
-
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
by: Zhao, Ming, et al.
Published: (2025) -
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
by: Liu, Xuannan, et al.
Published: (2024) -
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
by: Xia, Shuhan, et al.
Published: (2025) -
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025) -
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
by: Zou, Yueying, et al.
Published: (2025)