Saved in:
| Main Authors: | Zeng, Kang, Zhong, Guojin, Cheng, Jintao, Yuan, Jin, Li, Zhiyong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.17860 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
by: Zeng, Xingchen, et al.
Published: (2024)
by: Zeng, Xingchen, et al.
Published: (2024)
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024)
by: Lee, Jusung, et al.
Published: (2024)
Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat
by: Xu, Pusheng, et al.
Published: (2025)
by: Xu, Pusheng, et al.
Published: (2025)
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
by: Chen, Zixin, et al.
Published: (2025)
by: Chen, Zixin, et al.
Published: (2025)
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
AVIR: Adaptive Visual In-Document Retrieval for Efficient Multi-Page Document Question Answering
by: Li, Zongmin, et al.
Published: (2026)
by: Li, Zongmin, et al.
Published: (2026)
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
by: Hu, Zhongjian, et al.
Published: (2024)
by: Hu, Zhongjian, et al.
Published: (2024)
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
by: Park, Jean, et al.
Published: (2024)
by: Park, Jean, et al.
Published: (2024)
Evaluating Multimodal Large Language Models on Educational Textbook Question Answering
by: Alawwad, Hessa A., et al.
Published: (2025)
by: Alawwad, Hessa A., et al.
Published: (2025)
Advancing Egocentric Video Question Answering with Multimodal Large Language Models
by: Patel, Alkesh, et al.
Published: (2025)
by: Patel, Alkesh, et al.
Published: (2025)
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024)
by: Siripong, Surasakdi, et al.
Published: (2024)
Multi-TW: Benchmarking Multimodal Models on Traditional Chinese Question Answering in Taiwan
by: Yao, Jui-Ming, et al.
Published: (2025)
by: Yao, Jui-Ming, et al.
Published: (2025)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering
by: Li, Peize, et al.
Published: (2024)
by: Li, Peize, et al.
Published: (2024)
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)
by: Srivastava, Varun, et al.
Published: (2025)
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
by: Pan, Zhenyu, et al.
Published: (2024)
by: Pan, Zhenyu, et al.
Published: (2024)
Visual Question Decomposition on Multimodal Large Language Models
by: Zhang, Haowei, et al.
Published: (2024)
by: Zhang, Haowei, et al.
Published: (2024)
MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs
by: Hu, Yiheng, et al.
Published: (2025)
by: Hu, Yiheng, et al.
Published: (2025)
Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering
by: Sun, Hongda, et al.
Published: (2024)
by: Sun, Hongda, et al.
Published: (2024)
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
by: Chen, Peiyuan, et al.
Published: (2024)
by: Chen, Peiyuan, et al.
Published: (2024)
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
by: Lim, Qi Zhi, et al.
Published: (2025)
by: Lim, Qi Zhi, et al.
Published: (2025)
BALL SHRAMIKO KI SAMASYAYEN AVAM UNNMULAN HETU SARKARI PRAYAAS
by: DR. POONAM, et al.
Published: (2017)
by: DR. POONAM, et al.
Published: (2017)
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
by: Su, Tongkun, et al.
Published: (2024)
by: Su, Tongkun, et al.
Published: (2024)
Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation
by: Chen, Jiajun, et al.
Published: (2023)
by: Chen, Jiajun, et al.
Published: (2023)
MultiCube-RAG for Multi-hop Question Answering
by: Shi, Jimeng, et al.
Published: (2026)
by: Shi, Jimeng, et al.
Published: (2026)
Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models
by: Yao, Jui-Ming, et al.
Published: (2025)
by: Yao, Jui-Ming, et al.
Published: (2025)
Efficient Multimodal Planning Agent for Visual Question-Answering
by: Chen, Zhuo, et al.
Published: (2026)
by: Chen, Zhuo, et al.
Published: (2026)
Multimodal Reranking for Knowledge-Intensive Visual Question Answering
by: Wen, Haoyang, et al.
Published: (2024)
by: Wen, Haoyang, et al.
Published: (2024)
Multimodal Commonsense Knowledge Distillation for Visual Question Answering
by: Yang, Shuo, et al.
Published: (2024)
by: Yang, Shuo, et al.
Published: (2024)
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
by: Wang, Yuduo, et al.
Published: (2023)
by: Wang, Yuduo, et al.
Published: (2023)
EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering and Reasoning
by: Wei, Mingyang, et al.
Published: (2026)
by: Wei, Mingyang, et al.
Published: (2026)
MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language Models
by: Ouyang, Zhihao, et al.
Published: (2025)
by: Ouyang, Zhihao, et al.
Published: (2025)
On Domain-Adaptive Post-Training for Multimodal Large Language Models
by: Cheng, Daixuan, et al.
Published: (2024)
by: Cheng, Daixuan, et al.
Published: (2024)
Music Audio-Visual Question Answering Requires Specialized Multimodal Designs
by: You, Wenhao, et al.
Published: (2025)
by: You, Wenhao, et al.
Published: (2025)
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)
by: Cheng, Yu, et al.
Published: (2025)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation
by: Xu, Derong, et al.
Published: (2024)
by: Xu, Derong, et al.
Published: (2024)
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering
by: Shi, Yucheng, et al.
Published: (2024)
by: Shi, Yucheng, et al.
Published: (2024)
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025)
by: Lee, Dosung, et al.
Published: (2025)
Similar Items
-
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
by: Zeng, Xingchen, et al.
Published: (2024) -
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026) -
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024) -
Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat
by: Xu, Pusheng, et al.
Published: (2025) -
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
by: Chen, Zixin, et al.
Published: (2025)