Saved in:
| Main Author: | Fukui, Hiroki |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.00021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution
by: Fukui, Hiroki
Published: (2026)
by: Fukui, Hiroki
Published: (2026)
"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas
by: Ding, Junchen, et al.
Published: (2025)
by: Ding, Junchen, et al.
Published: (2025)
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
by: Duan, Shitong, et al.
Published: (2023)
by: Duan, Shitong, et al.
Published: (2023)
Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems
by: Fukui, Hiroki
Published: (2026)
by: Fukui, Hiroki
Published: (2026)
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
Semantic Consistency for Assuring Reliability of Large Language Models
by: Raj, Harsh, et al.
Published: (2023)
by: Raj, Harsh, et al.
Published: (2023)
Alignment as Iatrogenesis: Pastoral Power, Collective Pathology, and the Structural Limits of Monolingual Safety Evaluation
by: Fukui, Hiroki
Published: (2026)
by: Fukui, Hiroki
Published: (2026)
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play
by: Zeng, Yifan, et al.
Published: (2024)
by: Zeng, Yifan, et al.
Published: (2024)
TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution
by: Yang, Jiuding, et al.
Published: (2024)
by: Yang, Jiuding, et al.
Published: (2024)
How Large Language Models are Designed to Hallucinate
by: Ackermann, Richard, et al.
Published: (2025)
by: Ackermann, Richard, et al.
Published: (2025)
Do Large Language Models Get Caught in Hofstadter-Mobius Loops?
by: Hryszko, Jaroslaw
Published: (2026)
by: Hryszko, Jaroslaw
Published: (2026)
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
by: Wang, Xing, et al.
Published: (2025)
by: Wang, Xing, et al.
Published: (2025)
LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models
by: Meadows, Gwenyth Isobel, et al.
Published: (2024)
by: Meadows, Gwenyth Isobel, et al.
Published: (2024)
From Argumentation to Deliberation: Perspectivized Stance Vectors for Fine-grained (Dis)agreement Analysis
by: Plenz, Moritz, et al.
Published: (2025)
by: Plenz, Moritz, et al.
Published: (2025)
MedSimAI: Simulation and Formative Feedback Generation to Enhance Deliberate Practice in Medical Education
by: Hicke, Yann, et al.
Published: (2025)
by: Hicke, Yann, et al.
Published: (2025)
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
by: Fukui, Hiroki
Published: (2026)
by: Fukui, Hiroki
Published: (2026)
Cancer Vaccine Adjuvant Name Recognition from Biomedical Literature using Large Language Models
by: Rehana, Hasin, et al.
Published: (2025)
by: Rehana, Hasin, et al.
Published: (2025)
Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics
by: Romero, Peter, et al.
Published: (2024)
by: Romero, Peter, et al.
Published: (2024)
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
by: Wu, Addison J., et al.
Published: (2026)
by: Wu, Addison J., et al.
Published: (2026)
Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation
by: Koutcheme, Charles, et al.
Published: (2026)
by: Koutcheme, Charles, et al.
Published: (2026)
Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models
by: Wang, Meiyun, et al.
Published: (2024)
by: Wang, Meiyun, et al.
Published: (2024)
Anecdoctoring: Automated Red-Teaming Across Language and Place
by: Cuevas, Alejandro, et al.
Published: (2025)
by: Cuevas, Alejandro, et al.
Published: (2025)
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes
by: Gallegos, Isabel O., et al.
Published: (2024)
by: Gallegos, Isabel O., et al.
Published: (2024)
Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
by: Sun, Yuxi, et al.
Published: (2025)
by: Sun, Yuxi, et al.
Published: (2025)
How Do Vision-Language Models Process Conflicting Information Across Modalities?
by: Hua, Tianze, et al.
Published: (2025)
by: Hua, Tianze, et al.
Published: (2025)
Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition
by: Greco, Candida M., et al.
Published: (2024)
by: Greco, Candida M., et al.
Published: (2024)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models
by: Fu, Jiachen, et al.
Published: (2025)
by: Fu, Jiachen, et al.
Published: (2025)
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models
by: Frisch, Ivar, et al.
Published: (2024)
by: Frisch, Ivar, et al.
Published: (2024)
EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI
by: Kasu, Sai Kartheek Reddy
Published: (2025)
by: Kasu, Sai Kartheek Reddy
Published: (2025)
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
by: Karamolegkou, Antonia, et al.
Published: (2024)
by: Karamolegkou, Antonia, et al.
Published: (2024)
A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas
by: Venkit, Pranav Narayanan, et al.
Published: (2025)
by: Venkit, Pranav Narayanan, et al.
Published: (2025)
Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection
by: Ahmed, Ahmed Haj, et al.
Published: (2024)
by: Ahmed, Ahmed Haj, et al.
Published: (2024)
On the Creativity of Large Language Models
by: Franceschelli, Giorgio, et al.
Published: (2023)
by: Franceschelli, Giorgio, et al.
Published: (2023)
Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?
by: Xu, Naen, et al.
Published: (2025)
by: Xu, Naen, et al.
Published: (2025)
Cross-Language Bias Examination in Large Language Models
by: Liang, Yuxuan, et al.
Published: (2025)
by: Liang, Yuxuan, et al.
Published: (2025)
Do Language Models Reason Across Languages?
by: Meng, Yan, et al.
Published: (2026)
by: Meng, Yan, et al.
Published: (2026)
"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most
by: Zhou, Kaitlyn, et al.
Published: (2026)
by: Zhou, Kaitlyn, et al.
Published: (2026)
The Moral Consistency Pipeline: Continuous Ethical Evaluation for Large Language Models
by: Jamshidi, Saeid, et al.
Published: (2025)
by: Jamshidi, Saeid, et al.
Published: (2025)
Anticipating Innovation Using Large Language Models
by: Fenoaltea, Enrico Maria, et al.
Published: (2026)
by: Fenoaltea, Enrico Maria, et al.
Published: (2026)
Evaluating Large Language Models for Detecting Antisemitism
by: Patel, Jay, et al.
Published: (2025)
by: Patel, Jay, et al.
Published: (2025)
Similar Items
-
A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution
by: Fukui, Hiroki
Published: (2026) -
"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas
by: Ding, Junchen, et al.
Published: (2025) -
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
by: Duan, Shitong, et al.
Published: (2023) -
Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems
by: Fukui, Hiroki
Published: (2026) -
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
by: Xu, Rongwu, et al.
Published: (2024)