:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kong, Liangji, Joshi, Aditya, Karimi, Sarvnaz
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2512.02251
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
by: Dai, Xiang, et al.
Published: (2024)

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques
by: Dai, Xiang, et al.
Published: (2024)

Social Bias in Popular Question-Answering Benchmarks
by: Kraft, Angelie, et al.
Published: (2025)

Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language
by: Sammoudi, Mohammad, et al.
Published: (2024)

SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
by: Singh, Salam Michael, et al.
Published: (2024)

MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
by: Agarwal, Siddhant, et al.
Published: (2024)

LangLingual: A Personalised, Exercise-oriented English Language Learning Tool Leveraging Large Language Models
by: Gupta, Sammriddh, et al.
Published: (2025)

GG-BBQ: German Gender Bias Benchmark for Question Answering
by: Satheesh, Shalaka, et al.
Published: (2025)

None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering
by: Tam, Zhi Rui, et al.
Published: (2025)

Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM
by: Wang, Neo, et al.
Published: (2025)

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text
by: Prabhu, M Manvith, et al.
Published: (2024)

MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction
by: Dai, Xiang, et al.
Published: (2024)

SyllabusQA: A Course Logistics Question Answering Dataset
by: Fernandez, Nigel, et al.
Published: (2024)

Mitigating Bias for Question Answering Models by Tracking Bias Influence
by: Ma, Mingyu Derek, et al.
Published: (2023)

Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge
by: Usmanova, Aida, et al.
Published: (2024)

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark
by: Bahaj, Adil, et al.
Published: (2025)

Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering
by: Hu, Zhanghao, et al.
Published: (2025)

CSIRO-LT at SemEval-2025 Task 11: Adapting LLMs for Emotion Recognition for Multiple Languages
by: Chen, Jiyu, et al.
Published: (2025)

Can AI Extract Antecedent Factors of Human Trust in AI? An Application of Information Extraction for Scientific Literature in Behavioural and Computer Sciences
by: McGrath, Melanie, et al.
Published: (2024)

Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
by: Haider, Batool, et al.
Published: (2025)

DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models
by: Bae, Suyoung, et al.
Published: (2025)

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
by: Samory, Mattia, et al.
Published: (2025)

MahaSQuAD: Bridging Linguistic Divides in Marathi Question-Answering
by: Ghatage, Ruturaj, et al.
Published: (2024)

Multi-Hop Reasoning for Question Answering with Hyperbolic Representations
by: Welz, Simon, et al.
Published: (2025)

RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
by: Qiu, Weikang, et al.
Published: (2025)

IndicSQuAD: A Comprehensive Multilingual Question Answering Dataset for Indic Languages
by: Endait, Sharvi, et al.
Published: (2025)

Computational Analysis of Climate Policy
by: Hicks, Carolyn
Published: (2025)

Generative Debunking of Climate Misinformation
by: Zanartu, Francisco, et al.
Published: (2024)

Trust, Safety, and Accuracy: Assessing LLMs for Routine Maternity Advice
by: Divya, V Sai, et al.
Published: (2026)

SafeMath: Inference-time Safety improves Math Accuracy
by: Basu, Sagnik, et al.
Published: (2026)

Designing and Evaluating Chain-of-Hints for Scientific Question Answering
by: Jangra, Anubhav, et al.
Published: (2025)

LLMs Provide Unstable Answers to Legal Questions
by: Blair-Stanek, Andrew, et al.
Published: (2025)

Climate Change from Large Language Models
by: Zhu, Hongyin, et al.
Published: (2023)

C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset
by: Rudra, Manjari, et al.
Published: (2025)

Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions
by: Patil, Parth, et al.
Published: (2026)

Hanging in the Balance: Pivotal Moments in Crisis Counseling Conversations
by: Nguyen, Vivian, et al.
Published: (2025)

Does Scientific Writing Converge to U.S. English? Evidence from Generative AI-Assisted Publications
by: Filimonovic, Dragan, et al.
Published: (2025)

Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
by: Zhu, Wang Bill, et al.
Published: (2025)

Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques
by: Tami, Mohammad, et al.
Published: (2024)

Questionable practices in machine learning
by: Leech, Gavin, et al.
Published: (2024)