:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cho, Eunjung, Hoyle, Alexander, Hermstrüwer, Yoan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2509.00529
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Motivation in Large Language Models
by: Nahum, Omer, et al.
Published: (2026)

Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
by: Derner, Erik, et al.
Published: (2024)

Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea
by: Cho, Eunjung, et al.
Published: (2025)

Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process
by: Cho, Won Ik, et al.
Published: (2024)

The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
by: Abdelnabi, Sahar, et al.
Published: (2025)

Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
by: Hu, Jingyu, et al.
Published: (2024)

Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
by: Jahara, Fatima, et al.
Published: (2025)

AI Sandbagging: Language Models can Strategically Underperform on Evaluations
by: van der Weij, Teun, et al.
Published: (2024)

TRIDENT: Benchmarking LLM Safety in Finance, Medicine, and Law
by: Hui, Zheng, et al.
Published: (2025)

TIM: A Large-Scale Dataset and large Timeline Intelligence Model for Open-domain Timeline Summarization
by: Hu, Chuanrui, et al.
Published: (2025)

Strategic Insights in Human and Large Language Model Tactics at Word Guessing Games
by: Rikters, Matīss, et al.
Published: (2024)

Evaluating GPT-3.5's Awareness and Summarization Abilities for European Constitutional Texts with Shared Topics
by: Greco, Candida M., et al.
Published: (2024)

LEXam: Benchmarking Legal Reasoning on 340 Law Exams
by: Fan, Yu, et al.
Published: (2025)

MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in Education
by: Liu, Naiming, et al.
Published: (2024)

The Language You Ask In: Language-Conditioned Ideological Divergence in LLM Analysis of Contested Political Documents
by: Smirnov, Oleg
Published: (2026)

Agree to Disagree? A Meta-Evaluation of LLM Misgendering
by: Subramonian, Arjun, et al.
Published: (2025)

The Cambridge Law Corpus: A Dataset for Legal AI Research
by: Östling, Andreas, et al.
Published: (2023)

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)

Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
by: Juvekar, Kush, et al.
Published: (2025)

LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education
by: Weissburg, Iain, et al.
Published: (2024)

From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
by: Mavi, John, et al.
Published: (2025)

Topic-aware Large Language Models for Summarizing the Lived Healthcare Experiences Described in Health Stories
by: Bilalpur, Maneesh, et al.
Published: (2025)

PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
by: Akyürek, Afra Feyza, et al.
Published: (2025)

How Did We Get Here? Summarizing Conversation Dynamics
by: Hua, Yilun, et al.
Published: (2024)

Law in Silico: Simulating Legal Society with LLM-Based Agents
by: Wang, Yiding, et al.
Published: (2025)

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
by: Jain, Shomik, et al.
Published: (2025)

Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM
by: Wang, Neo, et al.
Published: (2025)

From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes
by: Garzón, Rubén, et al.
Published: (2026)

Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need
by: Bai, Yuqi, et al.
Published: (2025)

MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning
by: Ghosh, Rajarshi, et al.
Published: (2025)

Secure On-Premise Deployment of Open-Weights Large Language Models in Radiology: An Isolation-First Architecture with Prospective Pilot Evaluation
by: Nowak, Sebastian, et al.
Published: (2026)

Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization
by: Qi, Siya, et al.
Published: (2025)

Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues
by: Scarlatos, Alexander, et al.
Published: (2025)

Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing
by: Gui, Haoyang, et al.
Published: (2025)

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
by: Xiao, Yang, et al.
Published: (2025)

RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
by: Shin, Jisu, et al.
Published: (2025)

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)

Assessing Judging Bias in Large Reasoning Models: An Empirical Study
by: Wang, Qian, et al.
Published: (2025)

Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text
by: Prabhu, M Manvith, et al.
Published: (2024)

Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems
by: Kim, Juntae, et al.
Published: (2023)