Saved in:
| Main Authors: | Wang, Lehan, Wang, Haonan, Yang, Honglong, Mao, Jiaji, Yang, Zehong, Shen, Jun, Li, Xiaomeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.18387 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
by: Wang, Lehan, et al.
Published: (2025)
by: Wang, Lehan, et al.
Published: (2025)
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
by: Yang, Honglong, et al.
Published: (2025)
by: Yang, Honglong, et al.
Published: (2025)
Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning
by: Huai, Zheang, et al.
Published: (2026)
by: Huai, Zheang, et al.
Published: (2026)
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
by: Yang, Honglong, et al.
Published: (2024)
by: Yang, Honglong, et al.
Published: (2024)
Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis
by: Qin, Yi, et al.
Published: (2026)
by: Qin, Yi, et al.
Published: (2026)
DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation
by: Song, Shanshan, et al.
Published: (2025)
by: Song, Shanshan, et al.
Published: (2025)
From Learning to Unlearning: Biomedical Security Protection in Multimodal Large Language Models
by: Xu, Dunyuan, et al.
Published: (2025)
by: Xu, Dunyuan, et al.
Published: (2025)
Generalizable Entity Grounding via Assistance of Large Language Model
by: Qi, Lu, et al.
Published: (2024)
by: Qi, Lu, et al.
Published: (2024)
VAMPIRE: Uncovering Vessel Directional and Morphological Information from OCTA Images for Cardiovascular Disease Risk Factor Prediction
by: Wang, Lehan, et al.
Published: (2025)
by: Wang, Lehan, et al.
Published: (2025)
S&D Messenger: Exchanging Semantic and Domain Knowledge for Generic Semi-Supervised Medical Image Segmentation
by: Zhang, Qixiang, et al.
Published: (2024)
by: Zhang, Qixiang, et al.
Published: (2024)
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
by: Yan, Ziang, et al.
Published: (2024)
by: Yan, Ziang, et al.
Published: (2024)
MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images
by: Wang, Lehan, et al.
Published: (2024)
by: Wang, Lehan, et al.
Published: (2024)
MMaDA: Multimodal Large Diffusion Language Models
by: Yang, Ling, et al.
Published: (2025)
by: Yang, Ling, et al.
Published: (2025)
Token Activation Map to Visually Explain Multimodal LLMs
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks
by: Wu, Peiran, et al.
Published: (2024)
by: Wu, Peiran, et al.
Published: (2024)
DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers
by: Yang, Mengping, et al.
Published: (2026)
by: Yang, Mengping, et al.
Published: (2026)
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2024)
by: Zhan, Yufei, et al.
Published: (2024)
The Impact of Image Resolution on Biomedical Multimodal Large Language Models
by: Chen, Liangyu, et al.
Published: (2025)
by: Chen, Liangyu, et al.
Published: (2025)
Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation
by: Liu, Xiaohong, et al.
Published: (2024)
by: Liu, Xiaohong, et al.
Published: (2024)
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
by: Qi, Peng, et al.
Published: (2024)
by: Qi, Peng, et al.
Published: (2024)
LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models
by: Lin, Ci-Siang, et al.
Published: (2025)
by: Lin, Ci-Siang, et al.
Published: (2025)
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
by: Zhang, Guosheng, et al.
Published: (2025)
by: Zhang, Guosheng, et al.
Published: (2025)
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
by: Wang, Chunshi, et al.
Published: (2025)
by: Wang, Chunshi, et al.
Published: (2025)
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)
by: Li, Yi, et al.
Published: (2025)
Multimodal Model for Computational Pathology:Representation Learning and Image Compression
by: Wu, Peihang, et al.
Published: (2026)
by: Wu, Peihang, et al.
Published: (2026)
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
by: Wang, Haomin, et al.
Published: (2025)
by: Wang, Haomin, et al.
Published: (2025)
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
by: Wang, Yizhou, et al.
Published: (2025)
by: Wang, Yizhou, et al.
Published: (2025)
Debiasing Multimodal Large Language Models via Penalization of Language Priors
by: Zhang, YiFan, et al.
Published: (2024)
by: Zhang, YiFan, et al.
Published: (2024)
EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
by: Peng, Xiaomeng, et al.
Published: (2026)
by: Peng, Xiaomeng, et al.
Published: (2026)
Revealing the Impact of Visual Text Style on Attribute-based Descriptions Produced by Large Visual Language Models
by: Wang, Xiaomeng, et al.
Published: (2026)
by: Wang, Xiaomeng, et al.
Published: (2026)
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
by: Wang, Zhen, et al.
Published: (2023)
by: Wang, Zhen, et al.
Published: (2023)
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
by: Zhao, Zongchuang, et al.
Published: (2025)
by: Zhao, Zongchuang, et al.
Published: (2025)
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
by: Wang, Haonan, et al.
Published: (2024)
by: Wang, Haonan, et al.
Published: (2024)
LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models
by: Huang, Yu, et al.
Published: (2025)
by: Huang, Yu, et al.
Published: (2025)
A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
by: Lu, Jingyu, et al.
Published: (2025)
by: Lu, Jingyu, et al.
Published: (2025)
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
by: Feng, Ze, et al.
Published: (2025)
by: Feng, Ze, et al.
Published: (2025)
Similar Items
-
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
by: Wang, Lehan, et al.
Published: (2025) -
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
by: Wang, Haonan, et al.
Published: (2025) -
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration
by: Yang, Honglong, et al.
Published: (2025) -
Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning
by: Huai, Zheang, et al.
Published: (2026) -
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
by: Yang, Honglong, et al.
Published: (2024)