:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Healey, Jennifer, Byrum, Laurie, Akhtar, Md Nadeem, Bhargava, Surabhi, Sinha, Moumita
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.03053
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating Nuanced Bias in Large Language Model Free Response Answers
by: Healey, Jennifer, et al.
Published: (2024)

Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026)

Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)

Framing Political Bias in Multilingual LLMs Across Pakistani Languages
by: Nadeem, Afrozah, et al.
Published: (2025)

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
by: Su, Jinyan, et al.
Published: (2025)

Generating Leakage-Free Benchmarks for Robust RAG Evaluation
by: Liu, Jiayi, et al.
Published: (2026)

Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated
by: Zhu, Tiffany, et al.
Published: (2024)

Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges
by: Sinha, Shagun
Published: (2025)

Bengali Text Classification: An Evaluation of Large Language Model Approaches
by: Hoque, Md Mahmudul, et al.
Published: (2026)

Characterising the Creative Process in Humans and Large Language Models
by: Nath, Surabhi S., et al.
Published: (2024)

No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language Models
by: Kumar, Charaka Vinayak, et al.
Published: (2025)

Alleviating Choice Supportive Bias in LLM with Reasoning Dependency Generation
by: Zhuang, Nan, et al.
Published: (2025)

Fairness Evaluation and Inference Level Mitigation in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)

Impacts of Racial Bias in Historical Training Data for News AI
by: Bhargava, Rahul, et al.
Published: (2025)

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
by: Yao, Jing, et al.
Published: (2024)

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
by: Xu, Xin, et al.
Published: (2025)

ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
by: Elangovan, Aparna, et al.
Published: (2024)

Bias in Text Embedding Models
by: Rakivnenko, Vasyl, et al.
Published: (2024)

CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text
by: Elbouanani, Akram, et al.
Published: (2025)

Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
by: Lu, Dongxu, et al.
Published: (2025)

Enhancing Vision Models for Text-Heavy Content Understanding and Interaction
by: TG, Adithya, et al.
Published: (2024)

Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF
by: Hengle, Amey, et al.
Published: (2024)

Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
by: Zhang, Junhao, et al.
Published: (2025)

EROS: Entity-Driven Controlled Policy Document Summarization
by: Singh, Joykirat, et al.
Published: (2024)

SemEval 2024 -- Task 10: Emotion Discovery and Reasoning its Flip in Conversation (EDiReF)
by: Kumar, Shivani, et al.
Published: (2024)

ArxEval: Evaluating Retrieval and Generation in Language Models for Scientific Literature
by: Sinha, Aarush, et al.
Published: (2025)

Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT
by: Tao, Zhen, et al.
Published: (2024)

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
by: Tan, Haochen, et al.
Published: (2024)

I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation
by: Wu, Cheng-Kuang, et al.
Published: (2024)

Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
by: Jin, Jiho, et al.
Published: (2025)

PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
by: Fu, Xiao, et al.
Published: (2025)

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
by: Zhu, Wanrong, et al.
Published: (2024)

Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback
by: Whitehill, Jacob, et al.
Published: (2023)

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
by: Lu, Yuxiao, et al.
Published: (2024)

HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors of Language Models in Human-Machine Conversations
by: Sicilia, Anthony, et al.
Published: (2023)

LuxVeri at GenAI Detection Task 1: Inverse Perplexity Weighted Ensemble for Robust Detection of AI-Generated Text across English and Multilingual Contexts
by: Mobin, Md Kamrujjaman, et al.
Published: (2025)

Gender Bias in LLM-generated Interview Responses
by: Kong, Haein, et al.
Published: (2024)

From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
by: Kursuncu, Ugur, et al.
Published: (2025)

ATGen: A Framework for Active Text Generation
by: Tsvigun, Akim, et al.
Published: (2025)

Are Bias Evaluation Methods Biased ?
by: Berrayana, Lina, et al.
Published: (2025)