:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Brophy, Matthew
Format:	Preprint
Published:	2025
Subjects:	Computers and Society Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.00415
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Moral Alignment for LLM Agents
by: Tennant, Elizaveta, et al.
Published: (2024)

Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
by: Kwon, Jea, et al.
Published: (2025)

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
by: Kim, Jiseon, et al.
Published: (2025)

Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies
by: Hu, Yueqing, et al.
Published: (2026)

Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability
by: Ge, Yunfei, et al.
Published: (2024)

Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction
by: Pasandi, Faezeh B., et al.
Published: (2026)

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
by: Han, Shanshan
Published: (2024)

LLM Safety Alignment is Divergence Estimation in Disguise
by: Haldar, Rajdeep, et al.
Published: (2025)

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers
by: Chen, Benjamin Minhao, et al.
Published: (2026)

Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools
by: Lee, Carole J.
Published: (2026)

SynLang and Symbiotic Epistemology: A Manifesto for Conscious Human-AI Collaboration
by: Kapusta, Jan
Published: (2025)

Antisocial Analagous Behavior, Alignment and Human Impact of Google AI Systems: Evaluating through the lens of modified Antisocial Behavior Criteria by Human Interaction, Independent LLM Analysis, and AI Self-Reflection
by: Ogilvie, Alan D.
Published: (2024)

Normative Moral Pluralism for AI: A Framework for Deliberation in Complex Moral Contexts
by: Yaacov, David-Doron
Published: (2025)

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
by: Nghiem, Huy, et al.
Published: (2025)

Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics
by: Baum, Kevin
Published: (2025)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto
by: Tennant, Elizaveta, et al.
Published: (2023)

LLM Safety for Children
by: Rath, Prasanjit, et al.
Published: (2025)

Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024)

The AI Alignment Paradox
by: West, Robert, et al.
Published: (2024)

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models
by: Suzgun, Mirac, et al.
Published: (2024)

Simple Role Assignment is Extraordinarily Effective for Safety Alignment
by: Ziheng, Zhou, et al.
Published: (2026)

AI Safety is Stuck in Technical Terms -- A System Safety Response to the International AI Safety Report
by: Dobbe, Roel
Published: (2025)

Toward an African Agenda for AI Safety
by: Segun, Samuel T., et al.
Published: (2025)

LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)

Enriching Moral Perspectives on AI: Concepts of Trust amongst Africans
by: Amugongo, Lameck Mbangula, et al.
Published: (2025)

International Agreements on AI Safety: Review and Recommendations for a Conditional AI Safety Treaty
by: Scholefield, Rebecca, et al.
Published: (2025)

Rethinking AI Cultural Alignment
by: Bravansky, Michal, et al.
Published: (2025)

An Evaluation of Cultural Value Alignment in LLM
by: Sukiennik, Nicholas, et al.
Published: (2025)

Societal Alignment Frameworks Can Improve LLM Alignment
by: Stańczak, Karolina, et al.
Published: (2025)

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
by: Zhou, Zhenhong, et al.
Published: (2024)

Safety Cases: A Scalable Approach to Frontier AI Safety
by: Hilton, Benjamin, et al.
Published: (2025)

Safety Cases: How to Justify the Safety of Advanced AI Systems
by: Clymer, Joshua, et al.
Published: (2024)

Bridging the Gap in the Responsible AI Divides
by: Gyevnár, Bálint, et al.
Published: (2026)

Artificial Intelligence (AI) and the Relationship between Agency, Autonomy, and Moral Patiency
by: Formosa, Paul, et al.
Published: (2025)

Implicit Humanization in Everyday LLM Moral Judgments
by: Ayad, Hoda, et al.
Published: (2026)

Justifications for Democratizing AI Alignment and Their Prospects
by: Steingrüber, André, et al.
Published: (2025)

Ontology of Belief Diversity: A Community-Based Epistemological Approach
by: Fischella, Tyler, et al.
Published: (2024)

Concrete Problems in AI Safety, Revisited
by: Raji, Inioluwa Deborah, et al.
Published: (2023)

Demystify, Use, Reflect: Preparing students to be informed LLM-users
by: Chandrashekar, Nikitha Donekal, et al.
Published: (2025)