Saved in:
| Main Authors: | Kong, Liangji, Joshi, Aditya, Karimi, Sarvnaz |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.02251 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
by: Dai, Xiang, et al.
Published: (2024)
by: Dai, Xiang, et al.
Published: (2024)
Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques
by: Dai, Xiang, et al.
Published: (2024)
by: Dai, Xiang, et al.
Published: (2024)
Social Bias in Popular Question-Answering Benchmarks
by: Kraft, Angelie, et al.
Published: (2025)
by: Kraft, Angelie, et al.
Published: (2025)
Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language
by: Sammoudi, Mohammad, et al.
Published: (2024)
by: Sammoudi, Mohammad, et al.
Published: (2024)
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
by: Singh, Salam Michael, et al.
Published: (2024)
by: Singh, Salam Michael, et al.
Published: (2024)
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
by: Agarwal, Siddhant, et al.
Published: (2024)
by: Agarwal, Siddhant, et al.
Published: (2024)
LangLingual: A Personalised, Exercise-oriented English Language Learning Tool Leveraging Large Language Models
by: Gupta, Sammriddh, et al.
Published: (2025)
by: Gupta, Sammriddh, et al.
Published: (2025)
GG-BBQ: German Gender Bias Benchmark for Question Answering
by: Satheesh, Shalaka, et al.
Published: (2025)
by: Satheesh, Shalaka, et al.
Published: (2025)
None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering
by: Tam, Zhi Rui, et al.
Published: (2025)
by: Tam, Zhi Rui, et al.
Published: (2025)
Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM
by: Wang, Neo, et al.
Published: (2025)
by: Wang, Neo, et al.
Published: (2025)
Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text
by: Prabhu, M Manvith, et al.
Published: (2024)
by: Prabhu, M Manvith, et al.
Published: (2024)
MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction
by: Dai, Xiang, et al.
Published: (2024)
by: Dai, Xiang, et al.
Published: (2024)
SyllabusQA: A Course Logistics Question Answering Dataset
by: Fernandez, Nigel, et al.
Published: (2024)
by: Fernandez, Nigel, et al.
Published: (2024)
Mitigating Bias for Question Answering Models by Tracking Bias Influence
by: Ma, Mingyu Derek, et al.
Published: (2023)
by: Ma, Mingyu Derek, et al.
Published: (2023)
Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge
by: Usmanova, Aida, et al.
Published: (2024)
by: Usmanova, Aida, et al.
Published: (2024)
PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark
by: Bahaj, Adil, et al.
Published: (2025)
by: Bahaj, Adil, et al.
Published: (2025)
Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering
by: Hu, Zhanghao, et al.
Published: (2025)
by: Hu, Zhanghao, et al.
Published: (2025)
CSIRO-LT at SemEval-2025 Task 11: Adapting LLMs for Emotion Recognition for Multiple Languages
by: Chen, Jiyu, et al.
Published: (2025)
by: Chen, Jiyu, et al.
Published: (2025)
Can AI Extract Antecedent Factors of Human Trust in AI? An Application of Information Extraction for Scientific Literature in Behavioural and Computer Sciences
by: McGrath, Melanie, et al.
Published: (2024)
by: McGrath, Melanie, et al.
Published: (2024)
Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
by: Haider, Batool, et al.
Published: (2025)
by: Haider, Batool, et al.
Published: (2025)
DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
by: Bae, Suyoung, et al.
Published: (2025)
by: Bae, Suyoung, et al.
Published: (2025)
Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
by: Samory, Mattia, et al.
Published: (2025)
by: Samory, Mattia, et al.
Published: (2025)
MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering
by: Ghatage, Ruturaj, et al.
Published: (2024)
by: Ghatage, Ruturaj, et al.
Published: (2024)
Multi-Hop Reasoning for Question Answering with Hyperbolic Representations
by: Welz, Simon, et al.
Published: (2025)
by: Welz, Simon, et al.
Published: (2025)
RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
by: Qiu, Weikang, et al.
Published: (2025)
by: Qiu, Weikang, et al.
Published: (2025)
IndicSQuAD: A Comprehensive Multilingual Question Answering Dataset for Indic Languages
by: Endait, Sharvi, et al.
Published: (2025)
by: Endait, Sharvi, et al.
Published: (2025)
Computational Analysis of Climate Policy
by: Hicks, Carolyn
Published: (2025)
by: Hicks, Carolyn
Published: (2025)
Generative Debunking of Climate Misinformation
by: Zanartu, Francisco, et al.
Published: (2024)
by: Zanartu, Francisco, et al.
Published: (2024)
Trust, Safety, and Accuracy: Assessing LLMs for Routine Maternity Advice
by: Divya, V Sai, et al.
Published: (2026)
by: Divya, V Sai, et al.
Published: (2026)
SafeMath: Inference-time Safety improves Math Accuracy
by: Basu, Sagnik, et al.
Published: (2026)
by: Basu, Sagnik, et al.
Published: (2026)
Designing and Evaluating Chain-of-Hints for Scientific Question Answering
by: Jangra, Anubhav, et al.
Published: (2025)
by: Jangra, Anubhav, et al.
Published: (2025)
LLMs Provide Unstable Answers to Legal Questions
by: Blair-Stanek, Andrew, et al.
Published: (2025)
by: Blair-Stanek, Andrew, et al.
Published: (2025)
Climate Change from Large Language Models
by: Zhu, Hongyin, et al.
Published: (2023)
by: Zhu, Hongyin, et al.
Published: (2023)
C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset
by: Rudra, Manjari, et al.
Published: (2025)
by: Rudra, Manjari, et al.
Published: (2025)
Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions
by: Patil, Parth, et al.
Published: (2026)
by: Patil, Parth, et al.
Published: (2026)
Hanging in the Balance: Pivotal Moments in Crisis Counseling Conversations
by: Nguyen, Vivian, et al.
Published: (2025)
by: Nguyen, Vivian, et al.
Published: (2025)
Does Scientific Writing Converge to U.S. English? Evidence from Generative AI-Assisted Publications
by: Filimonovic, Dragan, et al.
Published: (2025)
by: Filimonovic, Dragan, et al.
Published: (2025)
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
by: Zhu, Wang Bill, et al.
Published: (2025)
by: Zhu, Wang Bill, et al.
Published: (2025)
Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques
by: Tami, Mohammad, et al.
Published: (2024)
by: Tami, Mohammad, et al.
Published: (2024)
Questionable practices in machine learning
by: Leech, Gavin, et al.
Published: (2024)
by: Leech, Gavin, et al.
Published: (2024)
Similar Items
-
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
by: Dai, Xiang, et al.
Published: (2024) -
Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques
by: Dai, Xiang, et al.
Published: (2024) -
Social Bias in Popular Question-Answering Benchmarks
by: Kraft, Angelie, et al.
Published: (2025) -
Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language
by: Sammoudi, Mohammad, et al.
Published: (2024) -
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
by: Singh, Salam Michael, et al.
Published: (2024)