Saved in:
| Main Authors: | Weng, Weixi, Zhu, Jieming, Meng, Xiaojun, Zhang, Hao, Zhang, Rui, Yuan, Chun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.07331 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
by: Hu, Xinyue, et al.
Published: (2023)
by: Hu, Xinyue, et al.
Published: (2023)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
by: Souibgui, Mohamed Ali, et al.
Published: (2025)
by: Souibgui, Mohamed Ali, et al.
Published: (2025)
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
by: Yu, Zhou, et al.
Published: (2023)
by: Yu, Zhou, et al.
Published: (2023)
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)
by: Vu, Tai, et al.
Published: (2025)
Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework
by: Weng, Weixi, et al.
Published: (2023)
by: Weng, Weixi, et al.
Published: (2023)
TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering
by: Akl, Ahmed, et al.
Published: (2024)
by: Akl, Ahmed, et al.
Published: (2024)
Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering
by: Dong, Junnan, et al.
Published: (2024)
by: Dong, Junnan, et al.
Published: (2024)
Find The Gap: Knowledge Base Reasoning For Visual Question Answering
by: Barezi, Elham J., et al.
Published: (2024)
by: Barezi, Elham J., et al.
Published: (2024)
COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
by: Chintapatla, Ishant, et al.
Published: (2025)
by: Chintapatla, Ishant, et al.
Published: (2025)
Privacy-Aware Document Visual Question Answering
by: Tito, Rubèn, et al.
Published: (2023)
by: Tito, Rubèn, et al.
Published: (2023)
Describe Anything Model for Visual Question Answering on Text-rich Images
by: Vu, Yen-Linh, et al.
Published: (2025)
by: Vu, Yen-Linh, et al.
Published: (2025)
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
by: Wang, Yuduo, et al.
Published: (2023)
by: Wang, Yuduo, et al.
Published: (2023)
Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
by: Indrehus, Kjetil, et al.
Published: (2026)
by: Indrehus, Kjetil, et al.
Published: (2026)
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
by: Chen, Peiyuan, et al.
Published: (2024)
by: Chen, Peiyuan, et al.
Published: (2024)
ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models
by: Chahe, Amirhosein, et al.
Published: (2025)
by: Chahe, Amirhosein, et al.
Published: (2025)
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
by: Hartsock, Iryna, et al.
Published: (2024)
by: Hartsock, Iryna, et al.
Published: (2024)
Improving Video Question Answering through query-based frame selection
by: Patil, Himanshu, et al.
Published: (2026)
by: Patil, Himanshu, et al.
Published: (2026)
RECODE: Reasoning Through Code Generation for Visual Question Answering
by: Shen, Junhong, et al.
Published: (2025)
by: Shen, Junhong, et al.
Published: (2025)
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
by: Li, Bingxin
Published: (2025)
by: Li, Bingxin
Published: (2025)
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
by: Mamaghan, Amir Mohammad Karimi, et al.
Published: (2024)
by: Mamaghan, Amir Mohammad Karimi, et al.
Published: (2024)
Exploring Diverse Methods in Visual Question Answering
by: Li, Panfeng, et al.
Published: (2024)
by: Li, Panfeng, et al.
Published: (2024)
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering
by: Zhu, He, et al.
Published: (2024)
by: Zhu, He, et al.
Published: (2024)
PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
by: Luo, Tianci, et al.
Published: (2026)
by: Luo, Tianci, et al.
Published: (2026)
Federated Learning with Instance-Dependent Noisy Label
by: Wang, Lei, et al.
Published: (2023)
by: Wang, Lei, et al.
Published: (2023)
ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)
Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks
by: Singh, Simranjit, et al.
Published: (2024)
by: Singh, Simranjit, et al.
Published: (2024)
Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains
by: Wang, Lei, et al.
Published: (2024)
by: Wang, Lei, et al.
Published: (2024)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
by: Du, Sinan, et al.
Published: (2024)
by: Du, Sinan, et al.
Published: (2024)
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
by: Sinha, Neelabh, et al.
Published: (2024)
by: Sinha, Neelabh, et al.
Published: (2024)
Glyph: Scaling Context Windows via Visual-Text Compression
by: Cheng, Jiale, et al.
Published: (2025)
by: Cheng, Jiale, et al.
Published: (2025)
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
by: Li, Xu, et al.
Published: (2025)
by: Li, Xu, et al.
Published: (2025)
Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation
by: Gao, Juntao, et al.
Published: (2025)
by: Gao, Juntao, et al.
Published: (2025)
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
by: Bian, Jieming, et al.
Published: (2024)
by: Bian, Jieming, et al.
Published: (2024)
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
by: Romero, David, et al.
Published: (2024)
by: Romero, David, et al.
Published: (2024)
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
by: Wen, Qishuai, et al.
Published: (2025)
by: Wen, Qishuai, et al.
Published: (2025)
Explore until Confident: Efficient Exploration for Embodied Question Answering
by: Ren, Allen Z., et al.
Published: (2024)
by: Ren, Allen Z., et al.
Published: (2024)
Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution
by: Yuan, Yutao, et al.
Published: (2024)
by: Yuan, Yutao, et al.
Published: (2024)
Enriching Knowledge Distillation with Intra-Class Contrastive Learning
by: Yuan, Hua, et al.
Published: (2025)
by: Yuan, Hua, et al.
Published: (2025)
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
by: Wen, Qishuai, et al.
Published: (2024)
by: Wen, Qishuai, et al.
Published: (2024)
Similar Items
-
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
by: Hu, Xinyue, et al.
Published: (2023) -
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025) -
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
by: Souibgui, Mohamed Ali, et al.
Published: (2025) -
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
by: Yu, Zhou, et al.
Published: (2023) -
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)