:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Imamura, Kenji, Ideuchi, Masao, Fujita, Atsushi
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2605.29340
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CADEL: A Corpus of Administrative Web Documents for Japanese Entity Linking
by: Higashiyama, Shohei, et al.
Published: (2026)

ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset
by: Higashiyama, Shohei, et al.
Published: (2026)

AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output
by: Suzuki, Hisami, et al.
Published: (2025)

Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers
by: Wang, Yuhan, et al.
Published: (2026)

MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
by: Veuthey, Jaime Raldua, et al.
Published: (2025)

Focusing on Students, not Machines: Grounded Question Generation and Automated Answer Grading
by: Meyer, Gérôme, et al.
Published: (2025)

Language-free Experience at Expo 2025 Osaka
by: Paul, Michael, et al.
Published: (2026)

BPQA Dataset: Evaluating How Well Language Models Leverage Blood Pressures to Answer Biomedical Questions
by: Hang, Chi, et al.
Published: (2025)

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering
by: Molfese, Francesco Maria, et al.
Published: (2025)

A Dataset of Open-Domain Question Answering with Multiple-Span Answers
by: Luo, Zhiyi, et al.
Published: (2024)

No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes
by: Cencerrado, Iván Vicente Moreno, et al.
Published: (2025)

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
by: Balepur, Nishant, et al.
Published: (2024)

From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
by: Zhou, Chengliang, et al.
Published: (2025)

Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
by: Henkel, Owen, et al.
Published: (2023)

EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
by: Naeem, Numaan, et al.
Published: (2025)

The Uneven Impact of Post-Training Quantization in Machine Translation
by: Marie, Benjamin, et al.
Published: (2025)

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation
by: Sahu, Chandan Kumar, et al.
Published: (2026)

Consensus or Conflict? Fine-Grained Evaluation of Conflicting Answers in Question-Answering
by: Nachshoni, Eviatar, et al.
Published: (2025)

Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions
by: Labat, Léo, et al.
Published: (2026)

Evaluating Answer Reranking Strategies in Time-sensitive Question Answering
by: Kardan, Mehmet, et al.
Published: (2025)

DEEPAMBIGQA: Ambiguous Multi-hop Questions for Benchmarking LLM Answer Completeness
by: Ji, Jiabao, et al.
Published: (2025)

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs
by: Belikova, Julia, et al.
Published: (2025)

Automatic Feedback Generation for Short Answer Questions using Answer Diagnostic Graphs
by: Furuhashi, Momoka, et al.
Published: (2025)

Evaluation Methodology for Large Language Models for Multilingual Document Question and Answer
by: Kahana, Adar, et al.
Published: (2024)

QGen Studio: An Adaptive Question-Answer Generation, Training and Evaluation Platform
by: Moses, Movina, et al.
Published: (2025)

Integrated Framework for LLM Evaluation with Answer Generation
by: Lee, Sujeong, et al.
Published: (2025)

Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
by: Ehsan, Md. Alvee, et al.
Published: (2025)

Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG
by: Qiu, Longpeng, et al.
Published: (2025)

LLMs Provide Unstable Answers to Legal Questions
by: Blair-Stanek, Andrew, et al.
Published: (2025)

Rehearsing Answers to Probable Questions with Perspective-Taking
by: Shih, Yung-Yu, et al.
Published: (2024)

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024)

A Dataset for Evaluating LLM-based Evaluation Functions for Research Question Extraction Task
by: Fujisaki, Yuya, et al.
Published: (2024)

When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
by: Yao, Siyang, et al.
Published: (2026)

Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets
by: Muneeb, Muhammad, et al.
Published: (2025)

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models
by: Iranmanesh, Reihaneh, et al.
Published: (2026)

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
by: Wiegreffe, Sarah, et al.
Published: (2024)

Decomposed Prompting to Answer Questions on a Course Discussion Board
by: Jaipersaud, Brandon, et al.
Published: (2024)

Controllable Decontextualization of Yes/No Question and Answers into Factual Statements
by: Mo, Lingbo, et al.
Published: (2024)

Automatic Question-Answer Generation for Long-Tail Knowledge
by: Kumar, Rohan, et al.
Published: (2024)

CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering
by: Li, Zongxia, et al.
Published: (2024)