:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Weng, Weixi, Zhu, Jieming, Meng, Xiaojun, Zhang, Hao, Zhang, Rui, Yuan, Chun
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2409.07331
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
by: Hu, Xinyue, et al.
Published: (2023)

Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)

DocVXQA: Context-Aware Visual Explanations for Document Question Answering
by: Souibgui, Mohamed Ali, et al.
Published: (2025)

Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
by: Yu, Zhou, et al.
Published: (2023)

BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)

Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework
by: Weng, Weixi, et al.
Published: (2023)

TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering
by: Akl, Ahmed, et al.
Published: (2024)

Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering
by: Dong, Junnan, et al.
Published: (2024)

Find The Gap: Knowledge Base Reasoning For Visual Question Answering
by: Barezi, Elham J., et al.
Published: (2024)

COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark
by: Chintapatla, Ishant, et al.
Published: (2025)

Privacy-Aware Document Visual Question Answering
by: Tito, Rubèn, et al.
Published: (2023)

Describe Anything Model for Visual Question Answering on Text-rich Images
by: Vu, Yen-Linh, et al.
Published: (2025)

RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
by: Wang, Yuduo, et al.
Published: (2023)

Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
by: Indrehus, Kjetil, et al.
Published: (2026)

Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
by: Chen, Peiyuan, et al.
Published: (2024)

ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models
by: Chahe, Amirhosein, et al.
Published: (2025)

Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
by: Hartsock, Iryna, et al.
Published: (2024)

Improving Video Question Answering through query-based frame selection
by: Patil, Himanshu, et al.
Published: (2026)

RECODE: Reasoning Through Code Generation for Visual Question Answering
by: Shen, Junhong, et al.
Published: (2025)

SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
by: Li, Bingxin
Published: (2025)

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
by: Mamaghan, Amir Mohammad Karimi, et al.
Published: (2024)

Exploring Diverse Methods in Visual Question Answering
by: Li, Panfeng, et al.
Published: (2024)

Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering
by: Zhu, He, et al.
Published: (2024)

PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
by: Luo, Tianci, et al.
Published: (2026)

Federated Learning with Instance-Dependent Noisy Label
by: Wang, Lei, et al.
Published: (2023)

ViInfographicVQA: A Benchmark for Single and Multi-image Visual Question Answering on Vietnamese Infographics
by: Van-Dinh, Tue-Thu, et al.
Published: (2025)

Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks
by: Singh, Simranjit, et al.
Published: (2024)

Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains
by: Wang, Lei, et al.
Published: (2024)

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts
by: Du, Sinan, et al.
Published: (2024)

Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types
by: Sinha, Neelabh, et al.
Published: (2024)

Glyph: Scaling Context Windows via Visual-Text Compression
by: Cheng, Jiale, et al.
Published: (2025)

MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
by: Li, Xu, et al.
Published: (2025)

Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation
by: Gao, Juntao, et al.
Published: (2025)

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
by: Bian, Jieming, et al.
Published: (2024)

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
by: Romero, David, et al.
Published: (2024)

Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
by: Wen, Qishuai, et al.
Published: (2025)

Explore until Confident: Efficient Exploration for Embodied Question Answering
by: Ren, Allen Z., et al.
Published: (2024)

Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution
by: Yuan, Yutao, et al.
Published: (2024)

Enriching Knowledge Distillation with Intra-Class Contrastive Learning
by: Yuan, Hua, et al.
Published: (2025)

Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
by: Wen, Qishuai, et al.
Published: (2024)