Saved in:
| Main Authors: | Gugg, Regina, Niederländer, Selina, Stöckl, Andreas, Flechl, Martin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.10639 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring the Impact of Personality Traits on LLM Bias and Toxicity
by: Wang, Shuo, et al.
Published: (2025)
by: Wang, Shuo, et al.
Published: (2025)
Generative Exaggeration in LLM Social Agents: Consistency, Bias, and Toxicity
by: Nudo, Jacopo, et al.
Published: (2025)
by: Nudo, Jacopo, et al.
Published: (2025)
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
by: Lim, Taein, et al.
Published: (2026)
by: Lim, Taein, et al.
Published: (2026)
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
by: Demchak, Nathaniel, et al.
Published: (2024)
by: Demchak, Nathaniel, et al.
Published: (2024)
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
by: Dekoninck, Jasper, et al.
Published: (2024)
by: Dekoninck, Jasper, et al.
Published: (2024)
The influence of persona and conversational task on social interactions with a LLM-controlled embodied conversational agent
by: Kroczek, Leon O. H., et al.
Published: (2024)
by: Kroczek, Leon O. H., et al.
Published: (2024)
Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models
by: Roy, Saumya
Published: (2025)
by: Roy, Saumya
Published: (2025)
Beyond Consensus: Mitigating the Agreeableness Bias in LLM Judge Evaluations
by: Jain, Suryaansh, et al.
Published: (2025)
by: Jain, Suryaansh, et al.
Published: (2025)
Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge
by: Zhou, Xiaolin, et al.
Published: (2026)
by: Zhou, Xiaolin, et al.
Published: (2026)
Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes
by: Masoudian, Shahed, et al.
Published: (2025)
by: Masoudian, Shahed, et al.
Published: (2025)
Investigating the Impact of LLM Personality on Cognitive Bias Manifestation in Automated Decision-Making Tasks
by: He, Jiangen, et al.
Published: (2025)
by: He, Jiangen, et al.
Published: (2025)
Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric
by: Koh, Hyukhun, et al.
Published: (2024)
by: Koh, Hyukhun, et al.
Published: (2024)
Detection and Measurement of Hailstones with Multimodal Large Language Models
by: Alker, Moritz, et al.
Published: (2025)
by: Alker, Moritz, et al.
Published: (2025)
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
by: Feuer, Benjamin, et al.
Published: (2026)
by: Feuer, Benjamin, et al.
Published: (2026)
Social Bias in LLM-Generated Code: Benchmark and Mitigation
by: Rabbi, Fazle, et al.
Published: (2026)
by: Rabbi, Fazle, et al.
Published: (2026)
IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages
by: Hanif, Ikhlasul Akmal, et al.
Published: (2026)
by: Hanif, Ikhlasul Akmal, et al.
Published: (2026)
No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language Models
by: Kumar, Charaka Vinayak, et al.
Published: (2025)
by: Kumar, Charaka Vinayak, et al.
Published: (2025)
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning
by: Wu, Xuyang, et al.
Published: (2025)
by: Wu, Xuyang, et al.
Published: (2025)
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
by: Lai, Peng, et al.
Published: (2026)
by: Lai, Peng, et al.
Published: (2026)
Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph
by: Zhao, Yibo, et al.
Published: (2024)
by: Zhao, Yibo, et al.
Published: (2024)
When Wording Steers the Evaluation: Framing Bias in LLM judges
by: Hwang, Yerin, et al.
Published: (2026)
by: Hwang, Yerin, et al.
Published: (2026)
LLM Bias Evaluation: Gender, Racial, and Age Disparities in Occupational and Crime Scenarios
by: Mirza, Vishal, et al.
Published: (2024)
by: Mirza, Vishal, et al.
Published: (2024)
Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems
by: Gao, Jiaxin, et al.
Published: (2025)
by: Gao, Jiaxin, et al.
Published: (2025)
When LLMs Benchmark Themselves: Deconstructing Self-Bias in Automated Evaluation
by: Xu, Wenda, et al.
Published: (2025)
by: Xu, Wenda, et al.
Published: (2025)
Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM
by: Puhach, Dariia, et al.
Published: (2025)
by: Puhach, Dariia, et al.
Published: (2025)
Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks
by: Chandrasekar, Ashok, et al.
Published: (2026)
by: Chandrasekar, Ashok, et al.
Published: (2026)
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
by: Jin, Jiho, et al.
Published: (2025)
by: Jin, Jiho, et al.
Published: (2025)
LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation
by: Ghosh, Himel, et al.
Published: (2026)
by: Ghosh, Himel, et al.
Published: (2026)
The Evaluation Game: Beyond Static LLM Benchmarking
by: Wang, Paul, et al.
Published: (2026)
by: Wang, Paul, et al.
Published: (2026)
Evaluation and Benchmarking of LLM Agents: A Survey
by: Mohammadi, Mahmoud, et al.
Published: (2025)
by: Mohammadi, Mahmoud, et al.
Published: (2025)
Who Benchmarks the Benchmarks? A Case Study of LLM Evaluation in Icelandic
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)
by: Ingimundarson, Finnur Ágúst, et al.
Published: (2026)
Judging the Judges: A Systematic Evaluation of Bias Mitigation Strategies in LLM-as-a-Judge Pipelines
by: Soumik, Sadman Kabir
Published: (2026)
by: Soumik, Sadman Kabir
Published: (2026)
AI Benchmarks and Datasets for LLM Evaluation
by: Ivanov, Todor, et al.
Published: (2024)
by: Ivanov, Todor, et al.
Published: (2024)
Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports
by: Stoll, Dragan, et al.
Published: (2026)
by: Stoll, Dragan, et al.
Published: (2026)
Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and Bias
by: Khan, Arifa, et al.
Published: (2024)
by: Khan, Arifa, et al.
Published: (2024)
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
by: Zhang, Danyang, et al.
Published: (2023)
by: Zhang, Danyang, et al.
Published: (2023)
On the Role of Speech Data in Reducing Toxicity Detection Bias
by: Bell, Samuel J., et al.
Published: (2024)
by: Bell, Samuel J., et al.
Published: (2024)
NC-Bench: An LLM Benchmark for Evaluating Conversational Competence
by: Moore, Robert J., et al.
Published: (2026)
by: Moore, Robert J., et al.
Published: (2026)
Realistic Evaluation of Toxicity in Large Language Models
by: Luong, Tinh Son, et al.
Published: (2024)
by: Luong, Tinh Son, et al.
Published: (2024)
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
by: Anupam, Sagnik, et al.
Published: (2025)
by: Anupam, Sagnik, et al.
Published: (2025)
Similar Items
-
Exploring the Impact of Personality Traits on LLM Bias and Toxicity
by: Wang, Shuo, et al.
Published: (2025) -
Generative Exaggeration in LLM Social Agents: Consistency, Bias, and Toxicity
by: Nudo, Jacopo, et al.
Published: (2025) -
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
by: Lim, Taein, et al.
Published: (2026) -
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks
by: Demchak, Nathaniel, et al.
Published: (2024) -
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
by: Dekoninck, Jasper, et al.
Published: (2024)