Saved in:
| Main Authors: | Wang, Cunxiang, Ning, Ruoxi, Pan, Boqi, Wu, Tonghui, Guo, Qipeng, Deng, Cheng, Bao, Guangsheng, Hu, Xiangkun, Zhang, Zheng, Wang, Qian, Zhang, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.12766 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
by: Wang, Shu, et al.
Published: (2026)
by: Wang, Shu, et al.
Published: (2026)
How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024)
by: Bao, Guangsheng, et al.
Published: (2024)
UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
DocTabQA: Answering Questions from Long Documents Using Tables
by: Wang, Haochen, et al.
Published: (2024)
by: Wang, Haochen, et al.
Published: (2024)
ClimaQA_SLO - Slovenian Climate Question-Answering Benchmark
by: Ferk Ovčjak, Monika, et al.
Published: (2025)
by: Ferk Ovčjak, Monika, et al.
Published: (2025)
Revisiting Greedy Decoding for Visual Question Answering: A Calibration Perspective
by: Chen, Boqi, et al.
Published: (2026)
by: Chen, Boqi, et al.
Published: (2026)
PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models
by: Li, Guangwei, et al.
Published: (2025)
by: Li, Guangwei, et al.
Published: (2025)
RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering
by: Hudspeth, Marisa, et al.
Published: (2026)
by: Hudspeth, Marisa, et al.
Published: (2026)
DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards
by: Kartha, Aaryaman, et al.
Published: (2025)
by: Kartha, Aaryaman, et al.
Published: (2025)
HCT-QA: A Benchmark for Question Answering on Human-Centric Tables
by: Ahmad, Mohammad S., et al.
Published: (2025)
by: Ahmad, Mohammad S., et al.
Published: (2025)
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English
by: Shafayat, Sheikh, et al.
Published: (2024)
by: Shafayat, Sheikh, et al.
Published: (2024)
SensorQA: A Question Answering Benchmark for Daily-Life Monitoring
by: Reichman, Benjamin, et al.
Published: (2025)
by: Reichman, Benjamin, et al.
Published: (2025)
MedExQA: Medical Question Answering Benchmark with Multiple Explanations
by: Kim, Yunsoo, et al.
Published: (2024)
by: Kim, Yunsoo, et al.
Published: (2024)
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
by: Li, Haopeng, et al.
Published: (2024)
by: Li, Haopeng, et al.
Published: (2024)
Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models
by: Min, Qingkai, et al.
Published: (2024)
by: Min, Qingkai, et al.
Published: (2024)
FairMedQA: Benchmarking Bias in Large Language Models for Medical Question Answering
by: Xiao, Ying, et al.
Published: (2025)
by: Xiao, Ying, et al.
Published: (2025)
CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering
by: Hu, Ruida, et al.
Published: (2024)
by: Hu, Ruida, et al.
Published: (2024)
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
by: Qian, Tianwen, et al.
Published: (2023)
by: Qian, Tianwen, et al.
Published: (2023)
FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions
by: Wang, Qiming, et al.
Published: (2024)
by: Wang, Qiming, et al.
Published: (2024)
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
by: Wang, Haochen, et al.
Published: (2025)
by: Wang, Haochen, et al.
Published: (2025)
ReCoQA: A Benchmark for Tool-Augmented and Multi-Step Reasoning in Real Estate Question and Answering
by: Zhang, Yindong, et al.
Published: (2026)
by: Zhang, Yindong, et al.
Published: (2026)
ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios
by: Pan, Changzai, et al.
Published: (2026)
by: Pan, Changzai, et al.
Published: (2026)
MizanQA: Benchmarking Large Language Models on Moroccan Legal Question Answering
by: Bahaj, Adil, et al.
Published: (2025)
by: Bahaj, Adil, et al.
Published: (2025)
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
by: Li, Zongxi, et al.
Published: (2025)
by: Li, Zongxi, et al.
Published: (2025)
DisastQA: A Comprehensive Benchmark for Evaluating Question Answering in Disaster Management
by: Chen, Zhitong, et al.
Published: (2026)
by: Chen, Zhitong, et al.
Published: (2026)
AmharicStoryQA: A Multicultural Story Question Answering Benchmark in Amharic
by: Azime, Israel Abebe, et al.
Published: (2026)
by: Azime, Israel Abebe, et al.
Published: (2026)
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
by: Xie, Tianchi, et al.
Published: (2025)
by: Xie, Tianchi, et al.
Published: (2025)
LaMP-QA: A Benchmark for Personalized Long-form Question Answering
by: Salemi, Alireza, et al.
Published: (2025)
by: Salemi, Alireza, et al.
Published: (2025)
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models
by: Zhu, Andrew, et al.
Published: (2024)
by: Zhu, Andrew, et al.
Published: (2024)
Differentiating Choices via Commonality for Multiple-Choice Question Answering
by: Deng, Wenqing, et al.
Published: (2024)
by: Deng, Wenqing, et al.
Published: (2024)
RA-QA: A Benchmarking System for Respiratory Audio Question Answering Under Real-World Heterogeneity
by: Bertolino, Gaia A., et al.
Published: (2026)
by: Bertolino, Gaia A., et al.
Published: (2026)
PolQA: Polish Question Answering Dataset
by: Rybak, Piotr, et al.
Published: (2022)
by: Rybak, Piotr, et al.
Published: (2022)
VoQA: Visual-only Question Answering
by: An, Jianing, et al.
Published: (2025)
by: An, Jianing, et al.
Published: (2025)
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
by: Onami, Eri, et al.
Published: (2024)
by: Onami, Eri, et al.
Published: (2024)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents
by: Wu, Shiwei, et al.
Published: (2024)
by: Wu, Shiwei, et al.
Published: (2024)
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
by: Wang, Yanling, et al.
Published: (2025)
by: Wang, Yanling, et al.
Published: (2025)
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
by: Dong, Kuicai, et al.
Published: (2025)
by: Dong, Kuicai, et al.
Published: (2025)
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering
by: Hu, Mengkang, et al.
Published: (2024)
by: Hu, Mengkang, et al.
Published: (2024)
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
by: Alonso, Iñigo, et al.
Published: (2024)
by: Alonso, Iñigo, et al.
Published: (2024)
MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark
by: Shaar, Shaden, et al.
Published: (2026)
by: Shaar, Shaden, et al.
Published: (2026)
Similar Items
-
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
by: Wang, Shu, et al.
Published: (2026) -
How Likely Do LLMs with CoT Mimic Human Reasoning?
by: Bao, Guangsheng, et al.
Published: (2024) -
UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge
by: Zhang, Yang, et al.
Published: (2025) -
DocTabQA: Answering Questions from Long Documents Using Tables
by: Wang, Haochen, et al.
Published: (2024) -
ClimaQA_SLO - Slovenian Climate Question-Answering Benchmark
by: Ferk Ovčjak, Monika, et al.
Published: (2025)