Saved in:
| Main Authors: | Shen, Yuhao, Qian, Jiahe, Zhang, Shuping, Chen, Zhangtianyi, Lu, Tao, Zhou, Juexiao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09195 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training
by: Qian, Jiahe, et al.
Published: (2025)
by: Qian, Jiahe, et al.
Published: (2025)
Trustworthy and Fair SkinGPT-R1 for Democratizing Dermatological Reasoning across Diverse Ethnicities
by: Shen, Yuhao, et al.
Published: (2025)
by: Shen, Yuhao, et al.
Published: (2025)
SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
by: Chen, Zhangtianyi, et al.
Published: (2026)
by: Chen, Zhangtianyi, et al.
Published: (2026)
SkinCaRe: A Multimodal Dermatology Dataset Annotated with Medical Caption and Chain-of-Thought Reasoning
by: Shen, Yuhao, et al.
Published: (2024)
by: Shen, Yuhao, et al.
Published: (2024)
Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal
by: Wang, Yuhao, et al.
Published: (2024)
by: Wang, Yuhao, et al.
Published: (2024)
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
by: Huang, Jinsheng, et al.
Published: (2024)
by: Huang, Jinsheng, et al.
Published: (2024)
Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
by: Lyu, Zonglin, et al.
Published: (2024)
by: Lyu, Zonglin, et al.
Published: (2024)
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
by: Zhao, Jiahe, et al.
Published: (2025)
by: Zhao, Jiahe, et al.
Published: (2025)
Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
by: Wang, Youze, et al.
Published: (2025)
by: Wang, Youze, et al.
Published: (2025)
Skin-R1: Toward Trustworthy Clinical Reasoning for Dermatological Diagnosis
by: Liu, Zehao, et al.
Published: (2025)
by: Liu, Zehao, et al.
Published: (2025)
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
by: Lu, Chaochao, et al.
Published: (2024)
by: Lu, Chaochao, et al.
Published: (2024)
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
by: Zhang, Shiyi, et al.
Published: (2024)
by: Zhang, Shiyi, et al.
Published: (2024)
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
by: Ouyang, Kun, et al.
Published: (2024)
by: Ouyang, Kun, et al.
Published: (2024)
Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark
by: Cheng, Ziming, et al.
Published: (2025)
by: Cheng, Ziming, et al.
Published: (2025)
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
by: Yu, Tianyu, et al.
Published: (2023)
by: Yu, Tianyu, et al.
Published: (2023)
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
by: Li, Keliang, et al.
Published: (2024)
by: Li, Keliang, et al.
Published: (2024)
EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
by: Liu, Shaoyu, et al.
Published: (2025)
by: Liu, Shaoyu, et al.
Published: (2025)
FunBench: Benchmarking Fundus Reading Skills of MLLMs
by: Wei, Qijie, et al.
Published: (2025)
by: Wei, Qijie, et al.
Published: (2025)
Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
by: Jiang, Roy, et al.
Published: (2026)
by: Jiang, Roy, et al.
Published: (2026)
Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs
by: Tu, Chongjun, et al.
Published: (2025)
by: Tu, Chongjun, et al.
Published: (2025)
Magic-MM-Embedding: Towards Visual-Token-Efficient Universal Multimodal Embedding with MLLMs
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
Benchmarking Large and Small MLLMs
by: Feng, Xuelu, et al.
Published: (2025)
by: Feng, Xuelu, et al.
Published: (2025)
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)
by: Yuan, Jiakang, et al.
Published: (2025)
Decoupled Competitive Framework for Semi-supervised Medical Image Segmentation
by: Chen, Jiahe, et al.
Published: (2025)
by: Chen, Jiahe, et al.
Published: (2025)
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
by: Lu, Lidong, et al.
Published: (2025)
by: Lu, Lidong, et al.
Published: (2025)
Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs
by: Jiang, Xueying, et al.
Published: (2026)
by: Jiang, Xueying, et al.
Published: (2026)
Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs
by: Cao, Rui, et al.
Published: (2024)
by: Cao, Rui, et al.
Published: (2024)
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
by: Yin, Zhihan, et al.
Published: (2026)
by: Yin, Zhihan, et al.
Published: (2026)
ActFormer: Scalable Collaborative Perception via Active Queries
by: Huang, Suozhi, et al.
Published: (2024)
by: Huang, Suozhi, et al.
Published: (2024)
IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
by: Zhang, Tao, et al.
Published: (2025)
by: Zhang, Tao, et al.
Published: (2025)
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
by: Wang, Junyang, et al.
Published: (2023)
by: Wang, Junyang, et al.
Published: (2023)
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
MokA: Multimodal Low-Rank Adaptation for MLLMs
by: Wei, Yake, et al.
Published: (2025)
by: Wei, Yake, et al.
Published: (2025)
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
by: Ma, Tzu-Yen, et al.
Published: (2026)
by: Ma, Tzu-Yen, et al.
Published: (2026)
iDETEX: Empowering MLLMs for Intelligent DETailed EXplainable IQA
by: Zhao, Zhaoran, et al.
Published: (2025)
by: Zhao, Zhaoran, et al.
Published: (2025)
Towards Benchmarking and Evaluating Deepfake Detection
by: Lin, Chenhao, et al.
Published: (2022)
by: Lin, Chenhao, et al.
Published: (2022)
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
by: Fang, Bo, et al.
Published: (2025)
by: Fang, Bo, et al.
Published: (2025)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
by: Fang, Irving, et al.
Published: (2025)
by: Fang, Irving, et al.
Published: (2025)
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
by: Zhao, Jiahe, et al.
Published: (2025)
by: Zhao, Jiahe, et al.
Published: (2025)
Similar Items
-
CoTBox-TTT: Grounding Medical VQA with Visual Chain-of-Thought Boxes During Test-time Training
by: Qian, Jiahe, et al.
Published: (2025) -
Trustworthy and Fair SkinGPT-R1 for Democratizing Dermatological Reasoning across Diverse Ethnicities
by: Shen, Yuhao, et al.
Published: (2025) -
SkinGPT-X: A Self-Evolving Collaborative Multi-Agent System for Transparent and Trustworthy Dermatological Diagnosis
by: Chen, Zhangtianyi, et al.
Published: (2026) -
SkinCaRe: A Multimodal Dermatology Dataset Annotated with Medical Caption and Chain-of-Thought Reasoning
by: Shen, Yuhao, et al.
Published: (2024) -
Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal
by: Wang, Yuhao, et al.
Published: (2024)