Saved in:
| Main Authors: | Wu, Xiangyu, Chi, Zhouyang, Yang, Yang, Lu, Jianfeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.04255 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Solution for OOD-CV UNICORN Challenge 2024 Object Detection Assistance LLM Counting Ability Improvement
by: Chi, Zhouyang, et al.
Published: (2024)
by: Chi, Zhouyang, et al.
Published: (2024)
The Solution for the CVPR2023 NICE Image Captioning Challenge
by: Wu, Xiangyu, et al.
Published: (2023)
by: Wu, Xiangyu, et al.
Published: (2023)
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
by: Huang, Yurui, et al.
Published: (2024)
by: Huang, Yurui, et al.
Published: (2024)
Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning
by: Wu, Xiangyu, et al.
Published: (2025)
by: Wu, Xiangyu, et al.
Published: (2025)
Multimodal Classification via Total Correlation Maximization
by: Yu, Feng, et al.
Published: (2026)
by: Yu, Feng, et al.
Published: (2026)
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge
by: Chao, Dian, et al.
Published: (2024)
by: Chao, Dian, et al.
Published: (2024)
Multimodal Rationales for Explainable Visual Question Answering
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
1st Place Solution to the 1st SkatingVerse Challenge
by: Sun, Tao, et al.
Published: (2024)
by: Sun, Tao, et al.
Published: (2024)
Multimodal Classification via Modal-Aware Interactive Enhancement
by: Jiang, Qing-Yuan, et al.
Published: (2024)
by: Jiang, Qing-Yuan, et al.
Published: (2024)
BERT-VQA: Visual Question Answering on Plots
by: Vu, Tai, et al.
Published: (2025)
by: Vu, Tai, et al.
Published: (2025)
The Solution for the GAIIC2024 RGB-TIR object detection Challenge
by: Wu, Xiangyu, et al.
Published: (2024)
by: Wu, Xiangyu, et al.
Published: (2024)
VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning
by: Huang, Muye, et al.
Published: (2024)
by: Huang, Muye, et al.
Published: (2024)
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge
by: Peng, Yingzhe, et al.
Published: (2024)
by: Peng, Yingzhe, et al.
Published: (2024)
Unifying Image Processing as Visual Prompting Question Answering
by: Liu, Yihao, et al.
Published: (2023)
by: Liu, Yihao, et al.
Published: (2023)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
MEGC2026: Micro-Expression Grand Challenge on Visual Question Answering
by: Fan, Xinqi, et al.
Published: (2026)
by: Fan, Xinqi, et al.
Published: (2026)
Reconstruction as a Bridge for Event-Based Visual Question Answering
by: Lou, Hanyue, et al.
Published: (2025)
by: Lou, Hanyue, et al.
Published: (2025)
Multi-Label Test-Time Adaptation with Bound Entropy Minimization
by: Wu, Xiangyu, et al.
Published: (2025)
by: Wu, Xiangyu, et al.
Published: (2025)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
First Place Solution to the MLCAS 2025 GWFSS Challenge: The Devil is in the Detail and Minority
by: Cao, Songliang, et al.
Published: (2025)
by: Cao, Songliang, et al.
Published: (2025)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
by: Zhang, Xiaoman, et al.
Published: (2023)
by: Zhang, Xiaoman, et al.
Published: (2023)
MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
by: Gai, Xiaotang, et al.
Published: (2024)
by: Gai, Xiaotang, et al.
Published: (2024)
ConFoThinking: Consolidated Focused Attention Driven Thinking for Visual Question Answering
by: Wu, Zhaodong, et al.
Published: (2026)
by: Wu, Zhaodong, et al.
Published: (2026)
Targeted Visual Prompting for Medical Visual Question Answering
by: Tascon-Morales, Sergio, et al.
Published: (2024)
by: Tascon-Morales, Sergio, et al.
Published: (2024)
Visual Robustness Benchmark for Visual Question Answering (VQA)
by: Ishmam, Md Farhan, et al.
Published: (2024)
by: Ishmam, Md Farhan, et al.
Published: (2024)
Visually Interpretable Subtask Reasoning for Visual Question Answering
by: Cheng, Yu, et al.
Published: (2025)
by: Cheng, Yu, et al.
Published: (2025)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution
by: Wang, Mengjiao, et al.
Published: (2025)
by: Wang, Mengjiao, et al.
Published: (2025)
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
by: Tang, Jingqun, et al.
Published: (2024)
by: Tang, Jingqun, et al.
Published: (2024)
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation
by: Wu, Ruipu, et al.
Published: (2024)
by: Wu, Ruipu, et al.
Published: (2024)
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024)
by: Lee, Jusung, et al.
Published: (2024)
Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking
by: Yang, Cheng-Yen, et al.
Published: (2025)
by: Yang, Cheng-Yen, et al.
Published: (2025)
DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
by: Al-Mohannadi, Aisha, et al.
Published: (2026)
VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)
by: An, Jianing, et al.
Published: (2025)
Evaluating Variance in Visual Question Answering Benchmarks
by: SR, Nikitha
Published: (2025)
by: SR, Nikitha
Published: (2025)
Fully Authentic Visual Question Answering Dataset from Online Communities
by: Chen, Chongyan, et al.
Published: (2023)
by: Chen, Chongyan, et al.
Published: (2023)
Query-Guided Spatial-Temporal-Frequency Interaction for Music Audio-Visual Question Answering
by: Li, Kun, et al.
Published: (2026)
by: Li, Kun, et al.
Published: (2026)
3rd Place Solution for VisDA 2021 Challenge -- Universally Domain Adaptive Image Recognition
by: Liao, Haojin, et al.
Published: (2021)
by: Liao, Haojin, et al.
Published: (2021)
Saliency Guided Longitudinal Medical Visual Question Answering
by: Wu, Jialin, et al.
Published: (2025)
by: Wu, Jialin, et al.
Published: (2025)
Similar Items
-
Solution for OOD-CV UNICORN Challenge 2024 Object Detection Assistance LLM Counting Ability Improvement
by: Chi, Zhouyang, et al.
Published: (2024) -
The Solution for the CVPR2023 NICE Image Captioning Challenge
by: Wu, Xiangyu, et al.
Published: (2023) -
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
by: Huang, Yurui, et al.
Published: (2024) -
Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning
by: Wu, Xiangyu, et al.
Published: (2025) -
Multimodal Classification via Total Correlation Maximization
by: Yu, Feng, et al.
Published: (2026)