Saved in:
| Main Authors: | Tian, Weiwei, Huang, Xinyu, Cheng, Tianhao, He, Wen, Fang, Jinwu, Feng, Rui, Geng, Daoying, Zhang, Xiaobo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.02608 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MOSMOS: Multi-organ segmentation facilitated by medical report supervision
by: Tian, Weiwei, et al.
Published: (2024)
by: Tian, Weiwei, et al.
Published: (2024)
Tag2Text: Guiding Vision-Language Model via Image Tagging
by: Huang, Xinyu, et al.
Published: (2023)
by: Huang, Xinyu, et al.
Published: (2023)
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
by: Li, Qingqiu, et al.
Published: (2025)
by: Li, Qingqiu, et al.
Published: (2025)
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
by: Huang, Xinyu, et al.
Published: (2025)
by: Huang, Xinyu, et al.
Published: (2025)
Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models
by: Zhang, Cheng, et al.
Published: (2026)
by: Zhang, Cheng, et al.
Published: (2026)
ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection
by: Huang, Tai-Ming, et al.
Published: (2025)
by: Huang, Tai-Ming, et al.
Published: (2025)
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
by: Ye, Qilang, et al.
Published: (2024)
by: Ye, Qilang, et al.
Published: (2024)
Aligning Medical Images with General Knowledge from Large Language Models
by: Fang, Xiao, et al.
Published: (2024)
by: Fang, Xiao, et al.
Published: (2024)
Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding
by: He, Jinlong, et al.
Published: (2024)
by: He, Jinlong, et al.
Published: (2024)
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models
by: Huang, Yu, et al.
Published: (2025)
by: Huang, Yu, et al.
Published: (2025)
Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space
by: Trinh, Quoc-Huy, et al.
Published: (2026)
by: Trinh, Quoc-Huy, et al.
Published: (2026)
ParseCaps: An Interpretable Parsing Capsule Network for Medical Image Diagnosis
by: Geng, Xinyu, et al.
Published: (2024)
by: Geng, Xinyu, et al.
Published: (2024)
Language-Instructed Reasoning for Group Activity Detection via Multimodal Large Language Model
by: Peng, Jihua, et al.
Published: (2025)
by: Peng, Jihua, et al.
Published: (2025)
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
by: He, Xin, et al.
Published: (2024)
by: He, Xin, et al.
Published: (2024)
E5-V: Universal Embeddings with Multimodal Large Language Models
by: Jiang, Ting, et al.
Published: (2024)
by: Jiang, Ting, et al.
Published: (2024)
Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks
by: Wu, Peiran, et al.
Published: (2024)
by: Wu, Peiran, et al.
Published: (2024)
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
by: Wang, Lehan, et al.
Published: (2025)
by: Wang, Lehan, et al.
Published: (2025)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
UniChange: Unifying Change Detection with Multimodal Large Language Model
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
by: Shen, Leyang, et al.
Published: (2024)
by: Shen, Leyang, et al.
Published: (2024)
MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images
by: Deria, Ankan, et al.
Published: (2026)
by: Deria, Ankan, et al.
Published: (2026)
Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
Anatomical Structure-Guided Medical Vision-Language Pre-training
by: Li, Qingqiu, et al.
Published: (2024)
by: Li, Qingqiu, et al.
Published: (2024)
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
by: Ai, Hao, et al.
Published: (2025)
by: Ai, Hao, et al.
Published: (2025)
ENC-Bench: A Benchmark for Evaluating Multimodal Large Language Models in Electronic Navigational Chart Understanding
by: Cheng, Ao, et al.
Published: (2026)
by: Cheng, Ao, et al.
Published: (2026)
NavBench: Probing Multimodal Large Language Models for Embodied Navigation
by: Qiao, Yanyuan, et al.
Published: (2025)
by: Qiao, Yanyuan, et al.
Published: (2025)
SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models
by: Lv, Weijiang, et al.
Published: (2026)
by: Lv, Weijiang, et al.
Published: (2026)
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
by: Ning, Zhenhua, et al.
Published: (2025)
by: Ning, Zhenhua, et al.
Published: (2025)
Harnessing Chain-of-Thought Reasoning in Multimodal Large Language Models for Face Anti-Spoofing
by: Zhang, Honglu, et al.
Published: (2025)
by: Zhang, Honglu, et al.
Published: (2025)
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
by: Liu, Haogeng, et al.
Published: (2024)
by: Liu, Haogeng, et al.
Published: (2024)
Leveraging Large Language Models for Multimodal Search
by: Barbany, Oriol, et al.
Published: (2024)
by: Barbany, Oriol, et al.
Published: (2024)
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
by: He, Xinan, et al.
Published: (2025)
by: He, Xinan, et al.
Published: (2025)
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
by: Cai, Rui, et al.
Published: (2025)
by: Cai, Rui, et al.
Published: (2025)
MMaDA: Multimodal Large Diffusion Language Models
by: Yang, Ling, et al.
Published: (2025)
by: Yang, Ling, et al.
Published: (2025)
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models
by: Xu, Xiao, et al.
Published: (2024)
by: Xu, Xiao, et al.
Published: (2024)
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
by: Wang, Hanqing, et al.
Published: (2025)
by: Wang, Hanqing, et al.
Published: (2025)
Check Field Detection Agent (CFD-Agent) using Multimodal Large Language and Vision Language Models
by: Halder, Sourav, et al.
Published: (2025)
by: Halder, Sourav, et al.
Published: (2025)
Similar Items
-
MOSMOS: Multi-organ segmentation facilitated by medical report supervision
by: Tian, Weiwei, et al.
Published: (2024) -
Tag2Text: Guiding Vision-Language Model via Image Tagging
by: Huang, Xinyu, et al.
Published: (2023) -
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
by: Li, Qingqiu, et al.
Published: (2025) -
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
by: Huang, Xinyu, et al.
Published: (2025) -
Attribute-Grounded Selective Reasoning for Artwork Emotion Understanding with Multimodal Large Language Models
by: Zhang, Cheng, et al.
Published: (2026)