Saved in:
| Main Authors: | Zhang, Bingquan, Liu, Xiaoxiao, Wang, Yuchi, Zhou, Lei, Xie, Qianqian, Wang, Benyou |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.14783 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment
by: Hong, Chang, et al.
Published: (2025)
by: Hong, Chang, et al.
Published: (2025)
MIRA: A Bilingual Benchmark for Medical Information Response Audit
by: Xu, Mengyu, et al.
Published: (2026)
by: Xu, Mengyu, et al.
Published: (2026)
Counterspeech for Mitigating the Influence of Media Bias: Comparing Human and LLM-Generated Responses
by: Lin, Luyang, et al.
Published: (2025)
by: Lin, Luyang, et al.
Published: (2025)
Do LLMs Triage Like Clinicians? A Dynamic Study of Outpatient Referral
by: Liu, Xiaoxiao, et al.
Published: (2025)
by: Liu, Xiaoxiao, et al.
Published: (2025)
From ChatGPT, DALL-E 3 to Sora: How has Generative AI Changed Digital Humanities Research and Services?
by: Liu, Jiangfeng, et al.
Published: (2024)
by: Liu, Jiangfeng, et al.
Published: (2024)
Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
LLM Agents for Education: Advances and Applications
by: Chu, Zhendong, et al.
Published: (2025)
by: Chu, Zhendong, et al.
Published: (2025)
Handling Students Dropouts in an LLM-driven Interactive Online Course Using Language Models
by: Wang, Yuanchun, et al.
Published: (2025)
by: Wang, Yuanchun, et al.
Published: (2025)
In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores
by: Tang, Zeyu, et al.
Published: (2026)
by: Tang, Zeyu, et al.
Published: (2026)
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil
by: Locatelli, Marcelo Sartori, et al.
Published: (2024)
by: Locatelli, Marcelo Sartori, et al.
Published: (2024)
LLM-Generated or Human-Written? Comparing Review and Non-Review Papers on ArXiv
by: Elazar, Yanai, et al.
Published: (2026)
by: Elazar, Yanai, et al.
Published: (2026)
Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies
by: Cohen, Myke C., et al.
Published: (2026)
by: Cohen, Myke C., et al.
Published: (2026)
Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI
by: Bojic, Ljubisa, et al.
Published: (2025)
by: Bojic, Ljubisa, et al.
Published: (2025)
The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies
by: Zhou, Jiaxu, et al.
Published: (2025)
by: Zhou, Jiaxu, et al.
Published: (2025)
Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti
by: Oni, Mangsura Kabir, et al.
Published: (2025)
by: Oni, Mangsura Kabir, et al.
Published: (2025)
Beyond English: Unveiling Multilingual Bias in LLM Copyright Compliance
by: Chen, Yupeng, et al.
Published: (2025)
by: Chen, Yupeng, et al.
Published: (2025)
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks
by: Møller, Anders Giovanni, et al.
Published: (2023)
by: Møller, Anders Giovanni, et al.
Published: (2023)
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study
by: Xu, Liuchang, et al.
Published: (2024)
by: Xu, Liuchang, et al.
Published: (2024)
Psychometric Comparability of LLM-Based Digital Twins
by: Zhang, Yufei, et al.
Published: (2025)
by: Zhang, Yufei, et al.
Published: (2025)
LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education
by: Weissburg, Iain, et al.
Published: (2024)
by: Weissburg, Iain, et al.
Published: (2024)
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
by: Huang, Yue, et al.
Published: (2025)
by: Huang, Yue, et al.
Published: (2025)
LLM or Human? Perceptions of Trust and Information Quality in Research Summaries
by: Akpinar, Nil-Jana, et al.
Published: (2026)
by: Akpinar, Nil-Jana, et al.
Published: (2026)
Facts are Harder Than Opinions -- A Multilingual, Comparative Analysis of LLM-Based Fact-Checking Reliability
by: Saju, Lorraine, et al.
Published: (2025)
by: Saju, Lorraine, et al.
Published: (2025)
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents
by: Yu, Jifan, et al.
Published: (2024)
by: Yu, Jifan, et al.
Published: (2024)
IDEAlign: Comparing Large Language Models to Human Experts in Open-ended Interpretive Annotations
by: Nam, Hyunji, et al.
Published: (2025)
by: Nam, Hyunji, et al.
Published: (2025)
Misalignment of LLM-Generated Personas with Human Perceptions in Low-Resource Settings
by: Prama, Tabia Tanzin, et al.
Published: (2025)
by: Prama, Tabia Tanzin, et al.
Published: (2025)
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings
by: Hong, Harbin, et al.
Published: (2025)
by: Hong, Harbin, et al.
Published: (2025)
PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
by: Bao, Han, et al.
Published: (2026)
by: Bao, Han, et al.
Published: (2026)
How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation
by: Xiao, Yang, et al.
Published: (2023)
by: Xiao, Yang, et al.
Published: (2023)
EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content
by: Bi, Shuzhen, et al.
Published: (2026)
by: Bi, Shuzhen, et al.
Published: (2026)
Humans or LLMs as the Judge? A Study on Judgement Biases
by: Chen, Guiming Hardy, et al.
Published: (2024)
by: Chen, Guiming Hardy, et al.
Published: (2024)
Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios
by: Xu, Shaochen, et al.
Published: (2024)
by: Xu, Shaochen, et al.
Published: (2024)
MoVa: Towards Generalizable Classification of Human Morals and Values
by: Chen, Ziyu, et al.
Published: (2025)
by: Chen, Ziyu, et al.
Published: (2025)
AuditWen:An Open-Source Large Language Model for Audit
by: Huang, Jiajia, et al.
Published: (2024)
by: Huang, Jiajia, et al.
Published: (2024)
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)
by: Li, Nathaniel, et al.
Published: (2024)
ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education
by: Wang, Kevin, et al.
Published: (2023)
by: Wang, Kevin, et al.
Published: (2023)
Multilingual Prompting for Improving LLM Generation Diversity
by: Wang, Qihan, et al.
Published: (2025)
by: Wang, Qihan, et al.
Published: (2025)
An LLM Agent for Automatic Geospatial Data Analysis
by: Chen, Yuxing, et al.
Published: (2024)
by: Chen, Yuxing, et al.
Published: (2024)
Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Explanations Across Indian and American STEM Education
by: Gupta, Amogh, et al.
Published: (2026)
by: Gupta, Amogh, et al.
Published: (2026)
Similar Items
-
PrinciplismQA: A Philosophy-Grounded Approach to Assessing LLM-Human Clinical Medical Ethics Alignment
by: Hong, Chang, et al.
Published: (2025) -
MIRA: A Bilingual Benchmark for Medical Information Response Audit
by: Xu, Mengyu, et al.
Published: (2026) -
Counterspeech for Mitigating the Influence of Media Bias: Comparing Human and LLM-Generated Responses
by: Lin, Luyang, et al.
Published: (2025) -
Do LLMs Triage Like Clinicians? A Dynamic Study of Outpatient Referral
by: Liu, Xiaoxiao, et al.
Published: (2025) -
From ChatGPT, DALL-E 3 to Sora: How has Generative AI Changed Digital Humanities Research and Services?
by: Liu, Jiangfeng, et al.
Published: (2024)