Saved in:
| Main Authors: | Cho, Eunjung, Hoyle, Alexander, Hermstrüwer, Yoan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.00529 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Motivation in Large Language Models
by: Nahum, Omer, et al.
Published: (2026)
by: Nahum, Omer, et al.
Published: (2026)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
by: Derner, Erik, et al.
Published: (2024)
by: Derner, Erik, et al.
Published: (2024)
Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea
by: Cho, Eunjung, et al.
Published: (2025)
by: Cho, Eunjung, et al.
Published: (2025)
Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process
by: Cho, Won Ik, et al.
Published: (2024)
by: Cho, Won Ik, et al.
Published: (2024)
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
by: Abdelnabi, Sahar, et al.
Published: (2025)
by: Abdelnabi, Sahar, et al.
Published: (2025)
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
by: Hu, Jingyu, et al.
Published: (2024)
by: Hu, Jingyu, et al.
Published: (2024)
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
by: Jahara, Fatima, et al.
Published: (2025)
by: Jahara, Fatima, et al.
Published: (2025)
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
by: van der Weij, Teun, et al.
Published: (2024)
by: van der Weij, Teun, et al.
Published: (2024)
TRIDENT: Benchmarking LLM Safety in Finance, Medicine, and Law
by: Hui, Zheng, et al.
Published: (2025)
by: Hui, Zheng, et al.
Published: (2025)
TIM: A Large-Scale Dataset and large Timeline Intelligence Model for Open-domain Timeline Summarization
by: Hu, Chuanrui, et al.
Published: (2025)
by: Hu, Chuanrui, et al.
Published: (2025)
Strategic Insights in Human and Large Language Model Tactics at Word Guessing Games
by: Rikters, Matīss, et al.
Published: (2024)
by: Rikters, Matīss, et al.
Published: (2024)
Evaluating GPT-3.5's Awareness and Summarization Abilities for European Constitutional Texts with Shared Topics
by: Greco, Candida M., et al.
Published: (2024)
by: Greco, Candida M., et al.
Published: (2024)
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
by: Fan, Yu, et al.
Published: (2025)
by: Fan, Yu, et al.
Published: (2025)
MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in Education
by: Liu, Naiming, et al.
Published: (2024)
by: Liu, Naiming, et al.
Published: (2024)
The Language You Ask In: Language-Conditioned Ideological Divergence in LLM Analysis of Contested Political Documents
by: Smirnov, Oleg
Published: (2026)
by: Smirnov, Oleg
Published: (2026)
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
by: Subramonian, Arjun, et al.
Published: (2025)
by: Subramonian, Arjun, et al.
Published: (2025)
The Cambridge Law Corpus: A Dataset for Legal AI Research
by: Östling, Andreas, et al.
Published: (2023)
by: Östling, Andreas, et al.
Published: (2023)
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)
by: An, Heajun, et al.
Published: (2026)
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
by: Juvekar, Kush, et al.
Published: (2025)
by: Juvekar, Kush, et al.
Published: (2025)
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education
by: Weissburg, Iain, et al.
Published: (2024)
by: Weissburg, Iain, et al.
Published: (2024)
From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
by: Mavi, John, et al.
Published: (2025)
by: Mavi, John, et al.
Published: (2025)
Topic-aware Large Language Models for Summarizing the Lived Healthcare Experiences Described in Health Stories
by: Bilalpur, Maneesh, et al.
Published: (2025)
by: Bilalpur, Maneesh, et al.
Published: (2025)
PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
by: Akyürek, Afra Feyza, et al.
Published: (2025)
by: Akyürek, Afra Feyza, et al.
Published: (2025)
How Did We Get Here? Summarizing Conversation Dynamics
by: Hua, Yilun, et al.
Published: (2024)
by: Hua, Yilun, et al.
Published: (2024)
Law in Silico: Simulating Legal Society with LLM-Based Agents
by: Wang, Yiding, et al.
Published: (2025)
by: Wang, Yiding, et al.
Published: (2025)
Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
by: Jain, Shomik, et al.
Published: (2025)
by: Jain, Shomik, et al.
Published: (2025)
Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM
by: Wang, Neo, et al.
Published: (2025)
by: Wang, Neo, et al.
Published: (2025)
From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes
by: Garzón, Rubén, et al.
Published: (2026)
by: Garzón, Rubén, et al.
Published: (2026)
Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need
by: Bai, Yuqi, et al.
Published: (2025)
by: Bai, Yuqi, et al.
Published: (2025)
MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning
by: Ghosh, Rajarshi, et al.
Published: (2025)
by: Ghosh, Rajarshi, et al.
Published: (2025)
Secure On-Premise Deployment of Open-Weights Large Language Models in Radiology: An Isolation-First Architecture with Prospective Pilot Evaluation
by: Nowak, Sebastian, et al.
Published: (2026)
by: Nowak, Sebastian, et al.
Published: (2026)
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization
by: Qi, Siya, et al.
Published: (2025)
by: Qi, Siya, et al.
Published: (2025)
Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues
by: Scarlatos, Alexander, et al.
Published: (2025)
by: Scarlatos, Alexander, et al.
Published: (2025)
Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing
by: Gui, Haoyang, et al.
Published: (2025)
by: Gui, Haoyang, et al.
Published: (2025)
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
by: Shin, Jisu, et al.
Published: (2025)
by: Shin, Jisu, et al.
Published: (2025)
Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)
by: Cai, Yunna, et al.
Published: (2025)
Assessing Judging Bias in Large Reasoning Models: An Empirical Study
by: Wang, Qian, et al.
Published: (2025)
by: Wang, Qian, et al.
Published: (2025)
Towards Unsupervised Question Answering System with Multi-level Summarization for Legal Text
by: Prabhu, M Manvith, et al.
Published: (2024)
by: Prabhu, M Manvith, et al.
Published: (2024)
Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems
by: Kim, Juntae, et al.
Published: (2023)
by: Kim, Juntae, et al.
Published: (2023)
Similar Items
-
Motivation in Large Language Models
by: Nahum, Omer, et al.
Published: (2026) -
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
by: Derner, Erik, et al.
Published: (2024) -
Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea
by: Cho, Eunjung, et al.
Published: (2025) -
Three Disclaimers for Safe Disclosure: A Cardwriter for Reporting the Use of Generative AI in Writing Process
by: Cho, Won Ik, et al.
Published: (2024) -
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
by: Abdelnabi, Sahar, et al.
Published: (2025)