Saved in:
| Main Authors: | Wang, Fei, Chen, Chengcheng, Chen, Hongyu, Chang, Yugang, Zeng, Weiming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.01445 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios
by: Chang, Yugang, et al.
Published: (2024)
by: Chang, Yugang, et al.
Published: (2024)
Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method
by: Wang, Fei, et al.
Published: (2025)
by: Wang, Fei, et al.
Published: (2025)
RSNet: A Light Framework for The Detection of SAR Ship Detection
by: Chen, Hongyu, et al.
Published: (2024)
by: Chen, Hongyu, et al.
Published: (2024)
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025)
by: Lee, Dosung, et al.
Published: (2025)
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
by: Zeng, Xingchen, et al.
Published: (2024)
by: Zeng, Xingchen, et al.
Published: (2024)
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
by: Xue, Junxiao, et al.
Published: (2024)
by: Xue, Junxiao, et al.
Published: (2024)
Music Audio-Visual Question Answering Requires Specialized Multimodal Designs
by: You, Wenhao, et al.
Published: (2025)
by: You, Wenhao, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering via Vision-Caption aware Supervised Fine-Tuning
by: Kapuriya, Janak, et al.
Published: (2025)
by: Kapuriya, Janak, et al.
Published: (2025)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Fully Authentic Visual Question Answering Dataset from Online Communities
by: Chen, Chongyan, et al.
Published: (2023)
by: Chen, Chongyan, et al.
Published: (2023)
MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering
by: Peng, Jingwei, et al.
Published: (2025)
by: Peng, Jingwei, et al.
Published: (2025)
RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads
by: Iyer, Vijayasri, et al.
Published: (2026)
by: Iyer, Vijayasri, et al.
Published: (2026)
Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
by: Wang, Han, et al.
Published: (2025)
by: Wang, Han, et al.
Published: (2025)
VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering
by: Li, Yuyi, et al.
Published: (2025)
by: Li, Yuyi, et al.
Published: (2025)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
Denoising-Enhanced YOLO for Robust SAR Ship Detection
by: Zhao, Xiaojing, et al.
Published: (2026)
by: Zhao, Xiaojing, et al.
Published: (2026)
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
by: Ma, Jie, et al.
Published: (2023)
by: Ma, Jie, et al.
Published: (2023)
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
by: Kabir, Raihan, et al.
Published: (2024)
by: Kabir, Raihan, et al.
Published: (2024)
NASTaR: NovaSAR Automated Ship Target Recognition Dataset
by: Hosseiny, Benyamin, et al.
Published: (2025)
by: Hosseiny, Benyamin, et al.
Published: (2025)
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling
by: Movva, Prahitha, et al.
Published: (2025)
by: Movva, Prahitha, et al.
Published: (2025)
Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning
by: Dong, Fuyu, et al.
Published: (2025)
by: Dong, Fuyu, et al.
Published: (2025)
Towards Fine-Grained Video Question Answering
by: Dai, Wei, et al.
Published: (2025)
by: Dai, Wei, et al.
Published: (2025)
Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training
by: Su, Tongkun, et al.
Published: (2024)
by: Su, Tongkun, et al.
Published: (2024)
Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat
by: Xu, Pusheng, et al.
Published: (2025)
by: Xu, Pusheng, et al.
Published: (2025)
Towards Signboard-Oriented Visual Question Answering: ViSignVQA Dataset, Method and Benchmark
by: Nguyen, Hieu Minh, et al.
Published: (2025)
by: Nguyen, Hieu Minh, et al.
Published: (2025)
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
by: Ding, Yihao, et al.
Published: (2024)
by: Ding, Yihao, et al.
Published: (2024)
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
StackOverflowVQA: Stack Overflow Visual Question Answering Dataset
by: Mirzaei, Motahhare, et al.
Published: (2024)
by: Mirzaei, Motahhare, et al.
Published: (2024)
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
by: Wang, Yuduo, et al.
Published: (2023)
by: Wang, Yuduo, et al.
Published: (2023)
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
by: Zhang, Zhengxuan, et al.
Published: (2025)
by: Zhang, Zhengxuan, et al.
Published: (2025)
AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering
by: Tuong, Nguyen Anh, et al.
Published: (2026)
by: Tuong, Nguyen Anh, et al.
Published: (2026)
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024)
by: Lee, Jusung, et al.
Published: (2024)
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering
by: Zhong, Liangyu, et al.
Published: (2025)
by: Zhong, Liangyu, et al.
Published: (2025)
Lightweight SAR Ship Detection via Contrastive Distillation
by: Devasundaram, Surendar, et al.
Published: (2026)
by: Devasundaram, Surendar, et al.
Published: (2026)
Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
by: Shourya, Aditya, et al.
Published: (2025)
by: Shourya, Aditya, et al.
Published: (2025)
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
Multimodal Integration of Human-Like Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
by: Pham, Tan-Hanh, et al.
Published: (2024)
by: Pham, Tan-Hanh, et al.
Published: (2024)
Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
Similar Items
-
MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios
by: Chang, Yugang, et al.
Published: (2024) -
Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method
by: Wang, Fei, et al.
Published: (2025) -
RSNet: A Light Framework for The Detection of SAR Ship Detection
by: Chen, Hongyu, et al.
Published: (2024) -
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
by: Lee, Dosung, et al.
Published: (2025) -
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
by: Zeng, Xingchen, et al.
Published: (2024)