Saved in:
| Main Author: | Brophy, Matthew |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.00415 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Moral Alignment for LLM Agents
by: Tennant, Elizaveta, et al.
Published: (2024)
by: Tennant, Elizaveta, et al.
Published: (2024)
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
by: Kwon, Jea, et al.
Published: (2025)
by: Kwon, Jea, et al.
Published: (2025)
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
by: Kim, Jiseon, et al.
Published: (2025)
by: Kim, Jiseon, et al.
Published: (2025)
Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies
by: Hu, Yueqing, et al.
Published: (2026)
by: Hu, Yueqing, et al.
Published: (2026)
Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability
by: Ge, Yunfei, et al.
Published: (2024)
by: Ge, Yunfei, et al.
Published: (2024)
Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction
by: Pasandi, Faezeh B., et al.
Published: (2026)
by: Pasandi, Faezeh B., et al.
Published: (2026)
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
by: Han, Shanshan
Published: (2024)
by: Han, Shanshan
Published: (2024)
LLM Safety Alignment is Divergence Estimation in Disguise
by: Haldar, Rajdeep, et al.
Published: (2025)
by: Haldar, Rajdeep, et al.
Published: (2025)
The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers
by: Chen, Benjamin Minhao, et al.
Published: (2026)
by: Chen, Benjamin Minhao, et al.
Published: (2026)
Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools
by: Lee, Carole J.
Published: (2026)
by: Lee, Carole J.
Published: (2026)
SynLang and Symbiotic Epistemology: A Manifesto for Conscious Human-AI Collaboration
by: Kapusta, Jan
Published: (2025)
by: Kapusta, Jan
Published: (2025)
Antisocial Analagous Behavior, Alignment and Human Impact of Google AI Systems: Evaluating through the lens of modified Antisocial Behavior Criteria by Human Interaction, Independent LLM Analysis, and AI Self-Reflection
by: Ogilvie, Alan D.
Published: (2024)
by: Ogilvie, Alan D.
Published: (2024)
Normative Moral Pluralism for AI: A Framework for Deliberation in Complex Moral Contexts
by: Yaacov, David-Doron
Published: (2025)
by: Yaacov, David-Doron
Published: (2025)
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)
by: Nghiem, Huy, et al.
Published: (2025)
Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics
by: Baum, Kevin
Published: (2025)
by: Baum, Kevin
Published: (2025)
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto
by: Tennant, Elizaveta, et al.
Published: (2023)
by: Tennant, Elizaveta, et al.
Published: (2023)
LLM Safety for Children
by: Rath, Prasanjit, et al.
Published: (2025)
by: Rath, Prasanjit, et al.
Published: (2025)
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024)
by: Vida, Karina, et al.
Published: (2024)
The AI Alignment Paradox
by: West, Robert, et al.
Published: (2024)
by: West, Robert, et al.
Published: (2024)
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
by: Suzgun, Mirac, et al.
Published: (2024)
by: Suzgun, Mirac, et al.
Published: (2024)
Simple Role Assignment is Extraordinarily Effective for Safety Alignment
by: Ziheng, Zhou, et al.
Published: (2026)
by: Ziheng, Zhou, et al.
Published: (2026)
AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)
by: Dobbe, Roel
Published: (2025)
Toward an African Agenda for AI Safety
by: Segun, Samuel T., et al.
Published: (2025)
by: Segun, Samuel T., et al.
Published: (2025)
LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)
by: Zhou, Guanghao, et al.
Published: (2026)
Enriching Moral Perspectives on AI: Concepts of Trust amongst Africans
by: Amugongo, Lameck Mbangula, et al.
Published: (2025)
by: Amugongo, Lameck Mbangula, et al.
Published: (2025)
International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)
by: Scholefield, Rebecca, et al.
Published: (2025)
Rethinking AI Cultural Alignment
by: Bravansky, Michal, et al.
Published: (2025)
by: Bravansky, Michal, et al.
Published: (2025)
An Evaluation of Cultural Value Alignment in LLM
by: Sukiennik, Nicholas, et al.
Published: (2025)
by: Sukiennik, Nicholas, et al.
Published: (2025)
Societal Alignment Frameworks Can Improve LLM Alignment
by: Stańczak, Karolina, et al.
Published: (2025)
by: Stańczak, Karolina, et al.
Published: (2025)
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)
by: Zhou, Zhenhong, et al.
Published: (2024)
Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)
by: Hilton, Benjamin, et al.
Published: (2025)
Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)
by: Clymer, Joshua, et al.
Published: (2024)
Bridging the Gap in the Responsible AI Divides
by: Gyevnár, Bálint, et al.
Published: (2026)
by: Gyevnár, Bálint, et al.
Published: (2026)
Artificial Intelligence (AI) and the Relationship between Agency, Autonomy, and Moral Patiency
by: Formosa, Paul, et al.
Published: (2025)
by: Formosa, Paul, et al.
Published: (2025)
Implicit Humanization in Everyday LLM Moral Judgments
by: Ayad, Hoda, et al.
Published: (2026)
by: Ayad, Hoda, et al.
Published: (2026)
Justifications for Democratizing AI Alignment and Their Prospects
by: Steingrüber, André, et al.
Published: (2025)
by: Steingrüber, André, et al.
Published: (2025)
Ontology of Belief Diversity: A Community-Based Epistemological Approach
by: Fischella, Tyler, et al.
Published: (2024)
by: Fischella, Tyler, et al.
Published: (2024)
Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
by: Raji, Inioluwa Deborah, et al.
Published: (2023)
Demystify, Use, Reflect: Preparing students to be informed LLM-users
by: Chandrashekar, Nikitha Donekal, et al.
Published: (2025)
by: Chandrashekar, Nikitha Donekal, et al.
Published: (2025)
Similar Items
-
Moral Alignment for LLM Agents
by: Tennant, Elizaveta, et al.
Published: (2024) -
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
by: Kwon, Jea, et al.
Published: (2025) -
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
by: Kim, Jiseon, et al.
Published: (2025) -
Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies
by: Hu, Yueqing, et al.
Published: (2026) -
Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability
by: Ge, Yunfei, et al.
Published: (2024)