:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Bingquan, Liu, Xiaoxiao, Wang, Yuchi, Zhou, Lei, Xie, Qianqian, Wang, Benyou
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2511.14783
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment
by: Hong, Chang, et al.
Published: (2025)

MIRA: A Bilingual Benchmark for Medical Information Response Audit
by: Xu, Mengyu, et al.
Published: (2026)

Counterspeech for Mitigating the Influence of Media Bias: Comparing Human and LLM-Generated Responses
by: Lin, Luyang, et al.
Published: (2025)

Do LLMs Triage Like Clinicians? A Dynamic Study of Outpatient Referral
by: Liu, Xiaoxiao, et al.
Published: (2025)

From ChatGPT, DALL-E 3 to Sora: How has Generative AI Changed Digital Humanities Research and Services?
by: Liu, Jiangfeng, et al.
Published: (2024)

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
by: Xiao, Yang, et al.
Published: (2025)

LLM Agents for Education: Advances and Applications
by: Chu, Zhendong, et al.
Published: (2025)

Handling Students Dropouts in an LLM-driven Interactive Online Course Using Language Models
by: Wang, Yuanchun, et al.
Published: (2025)

In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores
by: Tang, Zeyu, et al.
Published: (2026)

Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil
by: Locatelli, Marcelo Sartori, et al.
Published: (2024)

LLM-Generated or Human-Written? Comparing Review and Non-Review Papers on ArXiv
by: Elazar, Yanai, et al.
Published: (2026)

Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies
by: Cohen, Myke C., et al.
Published: (2026)

Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI
by: Bojic, Ljubisa, et al.
Published: (2025)

The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies
by: Zhou, Jiaxu, et al.
Published: (2025)

Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti
by: Oni, Mangsura Kabir, et al.
Published: (2025)

Beyond English: Unveiling Multilingual Bias in LLM Copyright Compliance
by: Chen, Yupeng, et al.
Published: (2025)

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)

The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks
by: Møller, Anders Giovanni, et al.
Published: (2023)

Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study
by: Xu, Liuchang, et al.
Published: (2024)

Psychometric Comparability of LLM-Based Digital Twins
by: Zhang, Yufei, et al.
Published: (2025)

LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education
by: Weissburg, Iain, et al.
Published: (2024)

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
by: Huang, Yue, et al.
Published: (2025)

LLM or Human? Perceptions of Trust and Information Quality in Research Summaries
by: Akpinar, Nil-Jana, et al.
Published: (2026)

Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability
by: Saju, Lorraine, et al.
Published: (2025)

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents
by: Yu, Jifan, et al.
Published: (2024)

IDEAlign: Comparing Large Language Models to Human Experts in Open-ended Interpretive Annotations
by: Nam, Hyunji, et al.
Published: (2025)

Misalignment of LLM-Generated Personas with Human Perceptions in Low-Resource Settings
by: Prama, Tabia Tanzin, et al.
Published: (2025)

Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings
by: Hong, Harbin, et al.
Published: (2025)

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
by: Bao, Han, et al.
Published: (2026)

How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation
by: Xiao, Yang, et al.
Published: (2023)

EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content
by: Bi, Shuzhen, et al.
Published: (2026)

Humans or LLMs as the Judge? A Study on Judgement Biases
by: Chen, Guiming Hardy, et al.
Published: (2024)

Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios
by: Xu, Shaochen, et al.
Published: (2024)

MoVa: Towards Generalizable Classification of Human Morals and Values
by: Chen, Ziyu, et al.
Published: (2025)

AuditWen:An Open-Source Large Language Model for Audit
by: Huang, Jiajia, et al.
Published: (2024)

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education
by: Wang, Kevin, et al.
Published: (2023)

Multilingual Prompting for Improving LLM Generation Diversity
by: Wang, Qihan, et al.
Published: (2025)

An LLM Agent for Automatic Geospatial Data Analysis
by: Chen, Yuxing, et al.
Published: (2024)

Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Explanations Across Indian and American STEM Education
by: Gupta, Amogh, et al.
Published: (2026)