:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shen, Yuhao, Qian, Jiahe, Zhang, Shuping, Chen, Zhangtianyi, Lu, Tao, Zhou, Juexiao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.09195
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training
by: Qian, Jiahe, et al.
Published: (2025)

Trustworthy and Fair SkinGPT-R1 for Democratizing Dermatological Reasoning across Diverse Ethnicities
by: Shen, Yuhao, et al.
Published: (2025)

SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
by: Chen, Zhangtianyi, et al.
Published: (2026)

SkinCaRe: A Multimodal Dermatology Dataset Annotated with Medical Caption and Chain-of-Thought Reasoning
by: Shen, Yuhao, et al.
Published: (2024)

Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal
by: Wang, Yuhao, et al.
Published: (2024)

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
by: Huang, Jinsheng, et al.
Published: (2024)

Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
by: Lyu, Zonglin, et al.
Published: (2024)

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
by: Zhao, Jiahe, et al.
Published: (2025)

Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
by: Wang, Youze, et al.
Published: (2025)

Skin-R1: Toward Trustworthy Clinical Reasoning for Dermatological Diagnosis
by: Liu, Zehao, et al.
Published: (2025)

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
by: Lu, Chaochao, et al.
Published: (2024)

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
by: Zhang, Shiyi, et al.
Published: (2024)

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
by: Ouyang, Kun, et al.
Published: (2024)

Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark
by: Cheng, Ziming, et al.
Published: (2025)

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
by: Yu, Tianyu, et al.
Published: (2023)

HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
by: Li, Keliang, et al.
Published: (2024)

EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
by: Liu, Shaoyu, et al.
Published: (2025)

FunBench: Benchmarking Fundus Reading Skills of MLLMs
by: Wei, Qijie, et al.
Published: (2025)

Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
by: Jiang, Roy, et al.
Published: (2026)

Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs
by: Tu, Chongjun, et al.
Published: (2025)

Magic-MM-Embedding: Towards Visual-Token-Efficient Universal Multimodal Embedding with MLLMs
by: Li, Qi, et al.
Published: (2026)

Benchmarking Large and Small MLLMs
by: Feng, Xuelu, et al.
Published: (2025)

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)

Decoupled Competitive Framework for Semi-supervised Medical Image Segmentation
by: Chen, Jiahe, et al.
Published: (2025)

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
by: Lu, Lidong, et al.
Published: (2025)

Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs
by: Jiang, Xueying, et al.
Published: (2026)

Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs
by: Cao, Rui, et al.
Published: (2024)

FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
by: Yin, Zhihan, et al.
Published: (2026)

ActFormer: Scalable Collaborative Perception via Active Queries
by: Huang, Suozhi, et al.
Published: (2024)

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
by: Wang, Junyang, et al.
Published: (2023)

MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
by: Yao, Huanjin, et al.
Published: (2025)

MokA: Multimodal Low-Rank Adaptation for MLLMs
by: Wei, Yake, et al.
Published: (2025)

THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
by: Ma, Tzu-Yen, et al.
Published: (2026)

iDETEX: Empowering MLLMs for Intelligent DETailed EXplainable IQA
by: Zhao, Zhaoran, et al.
Published: (2025)

Towards Benchmarking and Evaluating Deepfake Detection
by: Lin, Chenhao, et al.
Published: (2022)

Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
by: Fang, Bo, et al.
Published: (2025)

MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
by: Fang, Irving, et al.
Published: (2025)

HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
by: Zhao, Jiahe, et al.
Published: (2025)