Saved in:
| Main Authors: | Imamura, Kenji, Ideuchi, Masao, Fujita, Atsushi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CADEL: A Corpus of Administrative Web Documents for Japanese Entity Linking
by: Higashiyama, Shohei, et al.
Published: (2026)
by: Higashiyama, Shohei, et al.
Published: (2026)
ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset
by: Higashiyama, Shohei, et al.
Published: (2026)
by: Higashiyama, Shohei, et al.
Published: (2026)
AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output
by: Suzuki, Hisami, et al.
Published: (2025)
by: Suzuki, Hisami, et al.
Published: (2025)
Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers
by: Wang, Yuhan, et al.
Published: (2026)
by: Wang, Yuhan, et al.
Published: (2026)
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
by: Veuthey, Jaime Raldua, et al.
Published: (2025)
by: Veuthey, Jaime Raldua, et al.
Published: (2025)
Focusing on Students, not Machines: Grounded Question Generation and Automated Answer Grading
by: Meyer, Gérôme, et al.
Published: (2025)
by: Meyer, Gérôme, et al.
Published: (2025)
Language-free Experience at Expo 2025 Osaka
by: Paul, Michael, et al.
Published: (2026)
by: Paul, Michael, et al.
Published: (2026)
BPQA Dataset: Evaluating How Well Language Models Leverage Blood Pressures to Answer Biomedical Questions
by: Hang, Chi, et al.
Published: (2025)
by: Hang, Chi, et al.
Published: (2025)
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
by: Molfese, Francesco Maria, et al.
Published: (2025)
by: Molfese, Francesco Maria, et al.
Published: (2025)
A Dataset of Open-Domain Question Answering with Multiple-Span Answers
by: Luo, Zhiyi, et al.
Published: (2024)
by: Luo, Zhiyi, et al.
Published: (2024)
No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes
by: Cencerrado, Iván Vicente Moreno, et al.
Published: (2025)
by: Cencerrado, Iván Vicente Moreno, et al.
Published: (2025)
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
by: Balepur, Nishant, et al.
Published: (2024)
by: Balepur, Nishant, et al.
Published: (2024)
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
by: Zhou, Chengliang, et al.
Published: (2025)
by: Zhou, Chengliang, et al.
Published: (2025)
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
by: Henkel, Owen, et al.
Published: (2023)
by: Henkel, Owen, et al.
Published: (2023)
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
by: Naeem, Numaan, et al.
Published: (2025)
by: Naeem, Numaan, et al.
Published: (2025)
The Uneven Impact of Post-Training Quantization in Machine Translation
by: Marie, Benjamin, et al.
Published: (2025)
by: Marie, Benjamin, et al.
Published: (2025)
MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation
by: Sahu, Chandan Kumar, et al.
Published: (2026)
by: Sahu, Chandan Kumar, et al.
Published: (2026)
Consensus or Conflict? Fine-Grained Evaluation of Conflicting Answers in Question-Answering
by: Nachshoni, Eviatar, et al.
Published: (2025)
by: Nachshoni, Eviatar, et al.
Published: (2025)
Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions
by: Labat, Léo, et al.
Published: (2026)
by: Labat, Léo, et al.
Published: (2026)
Evaluating Answer Reranking Strategies in Time-sensitive Question Answering
by: Kardan, Mehmet, et al.
Published: (2025)
by: Kardan, Mehmet, et al.
Published: (2025)
DEEPAMBIGQA: Ambiguous Multi-hop Questions for Benchmarking LLM Answer Completeness
by: Ji, Jiabao, et al.
Published: (2025)
by: Ji, Jiabao, et al.
Published: (2025)
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs
by: Belikova, Julia, et al.
Published: (2025)
by: Belikova, Julia, et al.
Published: (2025)
Automatic Feedback Generation for Short Answer Questions using Answer Diagnostic Graphs
by: Furuhashi, Momoka, et al.
Published: (2025)
by: Furuhashi, Momoka, et al.
Published: (2025)
Evaluation Methodology for Large Language Models for Multilingual Document Question and Answer
by: Kahana, Adar, et al.
Published: (2024)
by: Kahana, Adar, et al.
Published: (2024)
QGen Studio: An Adaptive Question-Answer Generation, Training and Evaluation Platform
by: Moses, Movina, et al.
Published: (2025)
by: Moses, Movina, et al.
Published: (2025)
Integrated Framework for LLM Evaluation with Answer Generation
by: Lee, Sujeong, et al.
Published: (2025)
by: Lee, Sujeong, et al.
Published: (2025)
Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
by: Ehsan, Md. Alvee, et al.
Published: (2025)
by: Ehsan, Md. Alvee, et al.
Published: (2025)
Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG
by: Qiu, Longpeng, et al.
Published: (2025)
by: Qiu, Longpeng, et al.
Published: (2025)
LLMs Provide Unstable Answers to Legal Questions
by: Blair-Stanek, Andrew, et al.
Published: (2025)
by: Blair-Stanek, Andrew, et al.
Published: (2025)
Rehearsing Answers to Probable Questions with Perspective-Taking
by: Shih, Yung-Yu, et al.
Published: (2024)
by: Shih, Yung-Yu, et al.
Published: (2024)
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
A Dataset for Evaluating LLM-based Evaluation Functions for Research Question Extraction Task
by: Fujisaki, Yuya, et al.
Published: (2024)
by: Fujisaki, Yuya, et al.
Published: (2024)
When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
by: Yao, Siyang, et al.
Published: (2026)
by: Yao, Siyang, et al.
Published: (2026)
Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets
by: Muneeb, Muhammad, et al.
Published: (2025)
by: Muneeb, Muhammad, et al.
Published: (2025)
TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models
by: Iranmanesh, Reihaneh, et al.
Published: (2026)
by: Iranmanesh, Reihaneh, et al.
Published: (2026)
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)
by: Wiegreffe, Sarah, et al.
Published: (2024)
Decomposed Prompting to Answer Questions on a Course Discussion Board
by: Jaipersaud, Brandon, et al.
Published: (2024)
by: Jaipersaud, Brandon, et al.
Published: (2024)
Controllable Decontextualization of Yes/No Question and Answers into Factual Statements
by: Mo, Lingbo, et al.
Published: (2024)
by: Mo, Lingbo, et al.
Published: (2024)
Automatic Question-Answer Generation for Long-Tail Knowledge
by: Kumar, Rohan, et al.
Published: (2024)
by: Kumar, Rohan, et al.
Published: (2024)
CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering
by: Li, Zongxia, et al.
Published: (2024)
by: Li, Zongxia, et al.
Published: (2024)
Similar Items
-
CADEL: A Corpus of Administrative Web Documents for Japanese Entity Linking
by: Higashiyama, Shohei, et al.
Published: (2026) -
ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset
by: Higashiyama, Shohei, et al.
Published: (2026) -
AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output
by: Suzuki, Hisami, et al.
Published: (2025) -
Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers
by: Wang, Yuhan, et al.
Published: (2026) -
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
by: Veuthey, Jaime Raldua, et al.
Published: (2025)