Saved in:
| Main Authors: | Healey, Jennifer, Byrum, Laurie, Akhtar, Md Nadeem, Bhargava, Surabhi, Sinha, Moumita |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.03053 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating Nuanced Bias in Large Language Model Free Response Answers
by: Healey, Jennifer, et al.
Published: (2024)
by: Healey, Jennifer, et al.
Published: (2024)
Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026)
by: Nadeem, Afrozah, et al.
Published: (2026)
Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)
by: Nadeem, Afrozah, et al.
Published: (2025)
Framing Political Bias in Multilingual LLMs Across Pakistani Languages
by: Nadeem, Afrozah, et al.
Published: (2025)
by: Nadeem, Afrozah, et al.
Published: (2025)
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
by: Su, Jinyan, et al.
Published: (2025)
by: Su, Jinyan, et al.
Published: (2025)
Generating Leakage-Free Benchmarks for Robust RAG Evaluation
by: Liu, Jiayi, et al.
Published: (2026)
by: Liu, Jiayi, et al.
Published: (2026)
Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated
by: Zhu, Tiffany, et al.
Published: (2024)
by: Zhu, Tiffany, et al.
Published: (2024)
Abstractive Text Summarization for Contemporary Sanskrit Prose: Issues and Challenges
by: Sinha, Shagun
Published: (2025)
by: Sinha, Shagun
Published: (2025)
Bengali Text Classification: An Evaluation of Large Language Model Approaches
by: Hoque, Md Mahmudul, et al.
Published: (2026)
by: Hoque, Md Mahmudul, et al.
Published: (2026)
Characterising the Creative Process in Humans and Large Language Models
by: Nath, Surabhi S., et al.
Published: (2024)
by: Nath, Surabhi S., et al.
Published: (2024)
No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language Models
by: Kumar, Charaka Vinayak, et al.
Published: (2025)
by: Kumar, Charaka Vinayak, et al.
Published: (2025)
Alleviating Choice Supportive Bias in LLM with Reasoning Dependency Generation
by: Zhuang, Nan, et al.
Published: (2025)
by: Zhuang, Nan, et al.
Published: (2025)
Fairness Evaluation and Inference Level Mitigation in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)
by: Nadeem, Afrozah, et al.
Published: (2025)
Impacts of Racial Bias in Historical Training Data for News AI
by: Bhargava, Rahul, et al.
Published: (2025)
by: Bhargava, Rahul, et al.
Published: (2025)
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
by: Yao, Jing, et al.
Published: (2024)
by: Yao, Jing, et al.
Published: (2024)
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
by: Xu, Xin, et al.
Published: (2025)
by: Xu, Xin, et al.
Published: (2025)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
by: Elangovan, Aparna, et al.
Published: (2024)
by: Elangovan, Aparna, et al.
Published: (2024)
Bias in Text Embedding Models
by: Rakivnenko, Vasyl, et al.
Published: (2024)
by: Rakivnenko, Vasyl, et al.
Published: (2024)
CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text
by: Elbouanani, Akram, et al.
Published: (2025)
by: Elbouanani, Akram, et al.
Published: (2025)
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
by: Lu, Dongxu, et al.
Published: (2025)
by: Lu, Dongxu, et al.
Published: (2025)
Enhancing Vision Models for Text-Heavy Content Understanding and Interaction
by: TG, Adithya, et al.
Published: (2024)
by: TG, Adithya, et al.
Published: (2024)
Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF
by: Hengle, Amey, et al.
Published: (2024)
by: Hengle, Amey, et al.
Published: (2024)
Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
by: Zhang, Junhao, et al.
Published: (2025)
by: Zhang, Junhao, et al.
Published: (2025)
EROS: Entity-Driven Controlled Policy Document Summarization
by: Singh, Joykirat, et al.
Published: (2024)
by: Singh, Joykirat, et al.
Published: (2024)
SemEval 2024 -- Task 10: Emotion Discovery and Reasoning its Flip in Conversation (EDiReF)
by: Kumar, Shivani, et al.
Published: (2024)
by: Kumar, Shivani, et al.
Published: (2024)
ArxEval: Evaluating Retrieval and Generation in Language Models for Scientific Literature
by: Sinha, Aarush, et al.
Published: (2025)
by: Sinha, Aarush, et al.
Published: (2025)
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT
by: Tao, Zhen, et al.
Published: (2024)
by: Tao, Zhen, et al.
Published: (2024)
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models
by: Tan, Haochen, et al.
Published: (2024)
by: Tan, Haochen, et al.
Published: (2024)
I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation
by: Wu, Cheng-Kuang, et al.
Published: (2024)
by: Wu, Cheng-Kuang, et al.
Published: (2024)
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
by: Jin, Jiho, et al.
Published: (2025)
by: Jin, Jiho, et al.
Published: (2025)
PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
by: Fu, Xiao, et al.
Published: (2025)
by: Fu, Xiao, et al.
Published: (2025)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
by: Zhu, Wanrong, et al.
Published: (2024)
by: Zhu, Wanrong, et al.
Published: (2024)
Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback
by: Whitehill, Jacob, et al.
Published: (2023)
by: Whitehill, Jacob, et al.
Published: (2023)
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
by: Lu, Yuxiao, et al.
Published: (2024)
by: Lu, Yuxiao, et al.
Published: (2024)
HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors of Language Models in Human-Machine Conversations
by: Sicilia, Anthony, et al.
Published: (2023)
by: Sicilia, Anthony, et al.
Published: (2023)
LuxVeri at GenAI Detection Task 1: Inverse Perplexity Weighted Ensemble for Robust Detection of AI-Generated Text across English and Multilingual Contexts
by: Mobin, Md Kamrujjaman, et al.
Published: (2025)
by: Mobin, Md Kamrujjaman, et al.
Published: (2025)
Gender Bias in LLM-generated Interview Responses
by: Kong, Haein, et al.
Published: (2024)
by: Kong, Haein, et al.
Published: (2024)
From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
by: Kursuncu, Ugur, et al.
Published: (2025)
by: Kursuncu, Ugur, et al.
Published: (2025)
ATGen: A Framework for Active Text Generation
by: Tsvigun, Akim, et al.
Published: (2025)
by: Tsvigun, Akim, et al.
Published: (2025)
Are Bias Evaluation Methods Biased ?
by: Berrayana, Lina, et al.
Published: (2025)
by: Berrayana, Lina, et al.
Published: (2025)
Similar Items
-
Evaluating Nuanced Bias in Large Language Model Free Response Answers
by: Healey, Jennifer, et al.
Published: (2024) -
Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026) -
Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025) -
Framing Political Bias in Multilingual LLMs Across Pakistani Languages
by: Nadeem, Afrozah, et al.
Published: (2025) -
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
by: Su, Jinyan, et al.
Published: (2025)