Saved in:
| Main Authors: | Kim, Jiseon, Kwon, Jea, Vecchietti, Luiz Felipe, Oh, Alice, Cha, Meeyoung |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.10886 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
by: Kwon, Jea, et al.
Published: (2025)
by: Kwon, Jea, et al.
Published: (2025)
Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions
by: Kim, Jiseon, et al.
Published: (2026)
by: Kim, Jiseon, et al.
Published: (2026)
How Training Data Shapes the Use of Parametric and In-Context Knowledge in Language Models
by: Kim, Minsung, et al.
Published: (2025)
by: Kim, Minsung, et al.
Published: (2025)
Uncovering Factor Level Preferences to Improve Human-Model Alignment
by: Oh, Juhyun, et al.
Published: (2024)
by: Oh, Juhyun, et al.
Published: (2024)
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024)
by: Vida, Karina, et al.
Published: (2024)
Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models
by: Costa, Davi Bastos, et al.
Published: (2025)
by: Costa, Davi Bastos, et al.
Published: (2025)
Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies
by: Hu, Yueqing, et al.
Published: (2026)
by: Hu, Yueqing, et al.
Published: (2026)
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
by: Kiet, Huynh Trung, et al.
Published: (2026)
by: Kiet, Huynh Trung, et al.
Published: (2026)
German General Social Survey Personas: A Survey-Derived Persona Prompt Collection for Population-Aligned LLM Studies
by: Rupprecht, Jens, et al.
Published: (2025)
by: Rupprecht, Jens, et al.
Published: (2025)
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
by: Shin, Jisu, et al.
Published: (2025)
by: Shin, Jisu, et al.
Published: (2025)
KoBBQ: Korean Bias Benchmark for Question Answering
by: Jin, Jiho, et al.
Published: (2023)
by: Jin, Jiho, et al.
Published: (2023)
Societal Alignment Frameworks Can Improve LLM Alignment
by: Stańczak, Karolina, et al.
Published: (2025)
by: Stańczak, Karolina, et al.
Published: (2025)
Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks
by: Greco, Candida M., et al.
Published: (2026)
by: Greco, Candida M., et al.
Published: (2026)
Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need
by: Bai, Yuqi, et al.
Published: (2025)
by: Bai, Yuqi, et al.
Published: (2025)
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas
by: Backmann, Steffen, et al.
Published: (2025)
by: Backmann, Steffen, et al.
Published: (2025)
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
by: Oh, Juhyun, et al.
Published: (2024)
by: Oh, Juhyun, et al.
Published: (2024)
LLM Generated Persona is a Promise with a Catch
by: Li, Ang, et al.
Published: (2025)
by: Li, Ang, et al.
Published: (2025)
Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
by: Jinnai, Yuu
Published: (2024)
by: Jinnai, Yuu
Published: (2024)
Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing
by: Zimmerman, Fred
Published: (2026)
by: Zimmerman, Fred
Published: (2026)
PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment
by: Oh, Jihwan, et al.
Published: (2026)
by: Oh, Jihwan, et al.
Published: (2026)
The Need for a Socially-Grounded Persona Framework for User Simulation
by: Venkit, Pranav Narayanan, et al.
Published: (2026)
by: Venkit, Pranav Narayanan, et al.
Published: (2026)
Steering at the Source: Style Modulation Heads for Robust Persona Control
by: Izawa, Yoshihiro, et al.
Published: (2026)
by: Izawa, Yoshihiro, et al.
Published: (2026)
Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs
by: Oh, Gyutaek, et al.
Published: (2025)
by: Oh, Gyutaek, et al.
Published: (2025)
LLM Agents Predict Social Media Reactions but Do Not Outperform Text Classifiers: Benchmarking Simulation Accuracy Using 120K+ Personas of 1511 Humans
by: Bojic, Ljubisa, et al.
Published: (2026)
by: Bojic, Ljubisa, et al.
Published: (2026)
ProgressGym: Alignment with a Millennium of Moral Progress
by: Qiu, Tianyi, et al.
Published: (2024)
by: Qiu, Tianyi, et al.
Published: (2024)
Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
by: Sauter, Adrian, et al.
Published: (2026)
by: Sauter, Adrian, et al.
Published: (2026)
Moral Mazes in the Era of LLMs
by: Nguyen, Dang, et al.
Published: (2026)
by: Nguyen, Dang, et al.
Published: (2026)
Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation
by: Shin, Jisu, et al.
Published: (2025)
by: Shin, Jisu, et al.
Published: (2025)
From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
by: Qu, Jinxian, et al.
Published: (2026)
by: Qu, Jinxian, et al.
Published: (2026)
Machine Learning for Detection and Analysis of Novel LLM Jailbreaks
by: Hawkins, John, et al.
Published: (2025)
by: Hawkins, John, et al.
Published: (2025)
A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas
by: Venkit, Pranav Narayanan, et al.
Published: (2025)
by: Venkit, Pranav Narayanan, et al.
Published: (2025)
Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques
by: Koorndijk, Jeanice
Published: (2025)
by: Koorndijk, Jeanice
Published: (2025)
EconCausal: A Context-Aware Economic Reasoning Benchmark for Large Language Models
by: Lee, Donggyu, et al.
Published: (2025)
by: Lee, Donggyu, et al.
Published: (2025)
Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US
by: Park, Seongchan, et al.
Published: (2026)
by: Park, Seongchan, et al.
Published: (2026)
Embracing Dialectic Intersubjectivity: Coordination of Different Perspectives in Content Analysis with LLM Persona Simulation
by: Kang, Taewoo, et al.
Published: (2025)
by: Kang, Taewoo, et al.
Published: (2025)
MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models
by: Lin, Xiao, et al.
Published: (2025)
by: Lin, Xiao, et al.
Published: (2025)
Moral Alignment for LLM Agents
by: Tennant, Elizaveta, et al.
Published: (2024)
by: Tennant, Elizaveta, et al.
Published: (2024)
Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
by: Biancotti, Claudia, et al.
Published: (2024)
by: Biancotti, Claudia, et al.
Published: (2024)
Widespread Gender and Pronoun Bias in Moral Judgments Across LLMs
by: Fernandes, Gustavo Lúcius, et al.
Published: (2026)
by: Fernandes, Gustavo Lúcius, et al.
Published: (2026)
Scopes of Alignment
by: Varshney, Kush R., et al.
Published: (2025)
by: Varshney, Kush R., et al.
Published: (2025)
Similar Items
-
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
by: Kwon, Jea, et al.
Published: (2025) -
Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions
by: Kim, Jiseon, et al.
Published: (2026) -
How Training Data Shapes the Use of Parametric and In-Context Knowledge in Language Models
by: Kim, Minsung, et al.
Published: (2025) -
Uncovering Factor Level Preferences to Improve Human-Model Alignment
by: Oh, Juhyun, et al.
Published: (2024) -
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024)