Saved in:
| Main Author: | Fassold, Hannes |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.15851 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Faster than real-time detection of shot boundaries, sampling structure and dynamic keyframes in video
by: Fassold, Hannes
Published: (2025)
by: Fassold, Hannes
Published: (2025)
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024)
by: Siripong, Surasakdi, et al.
Published: (2024)
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
by: Cai, Chen, et al.
Published: (2024)
by: Cai, Chen, et al.
Published: (2024)
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)
by: Xing, Xiaoying, et al.
Published: (2025)
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
by: Shah, Monika, et al.
Published: (2025)
by: Shah, Monika, et al.
Published: (2025)
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024)
by: Lee, Jusung, et al.
Published: (2024)
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
by: Hao, Dongze, et al.
Published: (2024)
by: Hao, Dongze, et al.
Published: (2024)
Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models
by: An, Wenbin, et al.
Published: (2024)
by: An, Wenbin, et al.
Published: (2024)
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
by: Xu, Quanxing, et al.
Published: (2026)
by: Xu, Quanxing, et al.
Published: (2026)
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
by: Kim, Taehee, et al.
Published: (2024)
by: Kim, Taehee, et al.
Published: (2024)
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering
by: Xu, Yichen, et al.
Published: (2025)
by: Xu, Yichen, et al.
Published: (2025)
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
by: Chen, Xiuyuan, et al.
Published: (2023)
by: Chen, Xiuyuan, et al.
Published: (2023)
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?
by: Lando, Giuseppe, et al.
Published: (2025)
by: Lando, Giuseppe, et al.
Published: (2025)
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
by: Zeng, Xingchen, et al.
Published: (2024)
by: Zeng, Xingchen, et al.
Published: (2024)
SCRA-VQA: Summarized Caption-Rerank for Augmented Large Language Models in Visual Question Answering
by: Zhang, Yan, et al.
Published: (2025)
by: Zhang, Yan, et al.
Published: (2025)
Research on Vision-Language Question Answering Models for Industrial Robots
by: Li, Ping, et al.
Published: (2026)
by: Li, Ping, et al.
Published: (2026)
3D Question Answering via only 2D Vision-Language Models
by: Wang, Fengyun, et al.
Published: (2025)
by: Wang, Fengyun, et al.
Published: (2025)
HORNet: Task-Guided Frame Selection for Video Question Answering with Vision-Language Models
by: Bai, Xiangyu, et al.
Published: (2026)
by: Bai, Xiangyu, et al.
Published: (2026)
Advancing Egocentric Video Question Answering with Multimodal Large Language Models
by: Patel, Alkesh, et al.
Published: (2025)
by: Patel, Alkesh, et al.
Published: (2025)
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
by: Ha, Cuong Nhat, et al.
Published: (2024)
by: Ha, Cuong Nhat, et al.
Published: (2024)
ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
by: Wu, Yifan, et al.
Published: (2024)
by: Wu, Yifan, et al.
Published: (2024)
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
by: Lu, Xudong, et al.
Published: (2025)
by: Lu, Xudong, et al.
Published: (2025)
ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering
by: Nie, Yuxiang, et al.
Published: (2025)
by: Nie, Yuxiang, et al.
Published: (2025)
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)
by: Srivastava, Varun, et al.
Published: (2025)
Imp: Highly Capable Large Multimodal Models for Mobile Devices
by: Shao, Zhenwei, et al.
Published: (2024)
by: Shao, Zhenwei, et al.
Published: (2024)
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
by: Wang, Yanling, et al.
Published: (2025)
by: Wang, Yanling, et al.
Published: (2025)
EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
by: Li, Yanjun, et al.
Published: (2025)
by: Li, Yanjun, et al.
Published: (2025)
JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models
by: Sasaki, Hiroshi
Published: (2026)
by: Sasaki, Hiroshi
Published: (2026)
VideoDistill: Language-aware Vision Distillation for Video Question Answering
by: Zou, Bo, et al.
Published: (2024)
by: Zou, Bo, et al.
Published: (2024)
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering
by: Lim, Su Hyeon, et al.
Published: (2024)
by: Lim, Su Hyeon, et al.
Published: (2024)
ViLA: Efficient Video-Language Alignment for Video Question Answering
by: Wang, Xijun, et al.
Published: (2023)
by: Wang, Xijun, et al.
Published: (2023)
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models
by: Meng, Tian, et al.
Published: (2024)
by: Meng, Tian, et al.
Published: (2024)
Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
by: Shourya, Aditya, et al.
Published: (2025)
by: Shourya, Aditya, et al.
Published: (2025)
Question-Aware Gaussian Experts for Audio-Visual Question Answering
by: Kim, Hongyeob, et al.
Published: (2025)
by: Kim, Hongyeob, et al.
Published: (2025)
Selectively Answering Visual Questions
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
by: Eisenschlos, Julian Martin, et al.
Published: (2024)
Questioning the Stability of Visual Question Answering
by: Rosenfeld, Amir, et al.
Published: (2025)
by: Rosenfeld, Amir, et al.
Published: (2025)
Admitting Ignorance Helps the Video Question Answering Models to Answer
by: Li, Haopeng, et al.
Published: (2025)
by: Li, Haopeng, et al.
Published: (2025)
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
by: Yu, Zhou, et al.
Published: (2023)
by: Yu, Zhou, et al.
Published: (2023)
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
by: Guo, Danfeng, et al.
Published: (2024)
by: Guo, Danfeng, et al.
Published: (2024)
RSVLM-QA: A Benchmark Dataset for Remote Sensing Vision Language Model-based Question Answering
by: Zi, Xing, et al.
Published: (2025)
by: Zi, Xing, et al.
Published: (2025)
Similar Items
-
Faster than real-time detection of shot boundaries, sampling structure and dynamic keyframes in video
by: Fassold, Hannes
Published: (2025) -
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024) -
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
by: Cai, Chen, et al.
Published: (2024) -
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025) -
Analyzing the Sensitivity of Vision Language Models in Visual Question Answering
by: Shah, Monika, et al.
Published: (2025)