Saved in:
| Main Authors: | Petrova, Nora, Burden, John |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Missing Red Line: How Commercial Pressure Erodes AI Safety Boundaries
by: Petrova, Nora, et al.
Published: (2026)
by: Petrova, Nora, et al.
Published: (2026)
Evaluating AI Evaluation: Perils and Prospects
by: Burden, John
Published: (2024)
by: Burden, John
Published: (2024)
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
by: Petrova, Nora, et al.
Published: (2026)
by: Petrova, Nora, et al.
Published: (2026)
I Spy With My Model's Eye: Visual Search as a Behavioural Test for MLLMs
by: Burden, John, et al.
Published: (2025)
by: Burden, John, et al.
Published: (2025)
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
by: Burden, John, et al.
Published: (2025)
by: Burden, John, et al.
Published: (2025)
Framing the Game: How Context Shapes LLM Decision-Making
by: Robinson, Isaac, et al.
Published: (2025)
by: Robinson, Isaac, et al.
Published: (2025)
Empirical Evaluation of the Implicit Hitting Set Approach for Weighted CSPs
by: Petrova, Aleksandra, et al.
Published: (2025)
by: Petrova, Aleksandra, et al.
Published: (2025)
Stress-Testing Model Specs Reveals Character Differences among Language Models
by: Zhang, Jifan, et al.
Published: (2025)
by: Zhang, Jifan, et al.
Published: (2025)
Conversational Complexity for Assessing Risk in Large Language Models
by: Burden, John, et al.
Published: (2024)
by: Burden, John, et al.
Published: (2024)
Token Alignment via Character Matching for Subword Completion
by: Athiwaratkun, Ben, et al.
Published: (2024)
by: Athiwaratkun, Ben, et al.
Published: (2024)
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
by: Zhang, Jiawei, et al.
Published: (2025)
by: Zhang, Jiawei, et al.
Published: (2025)
Behavioural Analysis of Alignment Faking
by: Hadida, Nathaniel Mitrani, et al.
Published: (2026)
by: Hadida, Nathaniel Mitrani, et al.
Published: (2026)
From Abstract to Actionable: Pairwise Shapley Values for Explainable AI
by: Xu, Jiaxin, et al.
Published: (2025)
by: Xu, Jiaxin, et al.
Published: (2025)
Inferring Capabilities from Task Performance with Bayesian Triangulation
by: Burden, John, et al.
Published: (2023)
by: Burden, John, et al.
Published: (2023)
BehAVE: Behaviour Alignment of Video Game Encodings
by: Rašajski, Nemanja, et al.
Published: (2024)
by: Rašajski, Nemanja, et al.
Published: (2024)
Anytime Cooperative Implicit Hitting Set Solving
by: Rollón, Emma, et al.
Published: (2025)
by: Rollón, Emma, et al.
Published: (2025)
Towards Generalisable Imitation Learning Through Conditioned Transition Estimation and Online Behaviour Alignment
by: Gavenski, Nathan, et al.
Published: (2026)
by: Gavenski, Nathan, et al.
Published: (2026)
Evaluating Stability of Unreflective Alignment
by: Lucassen, James, et al.
Published: (2024)
by: Lucassen, James, et al.
Published: (2024)
From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning
by: Abdelwahed, Mustafa F., et al.
Published: (2026)
by: Abdelwahed, Mustafa F., et al.
Published: (2026)
Entropy-Aware Structural Alignment for Zero-Shot Handwritten Chinese Character Recognition
by: Luo, Qiuming, et al.
Published: (2026)
by: Luo, Qiuming, et al.
Published: (2026)
Alignment Revisited: Are Large Language Models Consistent in Stated and Revealed Preferences?
by: Gu, Zhuojun, et al.
Published: (2025)
by: Gu, Zhuojun, et al.
Published: (2025)
A Revealed Preference Framework for AI Alignment
by: Suleymanov, Elchin
Published: (2026)
by: Suleymanov, Elchin
Published: (2026)
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
by: Wang, Lei, et al.
Published: (2024)
by: Wang, Lei, et al.
Published: (2024)
Why Deep Jacobian Spectra Separate: Depth-Induced Scaling and Singular-Vector Alignment
by: Haas, Nathanaël, et al.
Published: (2026)
by: Haas, Nathanaël, et al.
Published: (2026)
Geometric Analysis of Token Selection in Multi-Head Attention
by: Mudarisov, Timur, et al.
Published: (2026)
by: Mudarisov, Timur, et al.
Published: (2026)
Interactive AI Alignment: Specification, Process, and Evaluation Alignment
by: Terry, Michael, et al.
Published: (2023)
by: Terry, Michael, et al.
Published: (2023)
Operationalizing Pluralistic Values in Large Language Model Alignment Reveals Trade-offs in Safety, Inclusivity, and Model Behavior
by: Ali, Dalia, et al.
Published: (2025)
by: Ali, Dalia, et al.
Published: (2025)
Evaluating Cognitive Age Alignment in Interactive AI Agents
by: Shen, Yifan, et al.
Published: (2026)
by: Shen, Yifan, et al.
Published: (2026)
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models
by: Nair, Inderjeet, et al.
Published: (2026)
by: Nair, Inderjeet, et al.
Published: (2026)
Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
by: Kaliosis, Panagiotis, et al.
Published: (2025)
by: Kaliosis, Panagiotis, et al.
Published: (2025)
Behaviour Distillation
by: Lupu, Andrei, et al.
Published: (2024)
by: Lupu, Andrei, et al.
Published: (2024)
An Evaluation of Cultural Value Alignment in LLM
by: Sukiennik, Nicholas, et al.
Published: (2025)
by: Sukiennik, Nicholas, et al.
Published: (2025)
Pluralistic Off-policy Evaluation and Alignment
by: Huang, Chengkai, et al.
Published: (2025)
by: Huang, Chengkai, et al.
Published: (2025)
Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization
by: Ma, Yuhang, et al.
Published: (2024)
by: Ma, Yuhang, et al.
Published: (2024)
StratMem-Bench: Evaluating Strategic Memory Use in Virtual Character Conversation Beyond Factual Recall
by: Wu, Yerong, et al.
Published: (2026)
by: Wu, Yerong, et al.
Published: (2026)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
by: Fan, Siqi, et al.
Published: (2025)
by: Fan, Siqi, et al.
Published: (2025)
From Language Models over Tokens to Language Models over Characters
by: Vieira, Tim, et al.
Published: (2024)
by: Vieira, Tim, et al.
Published: (2024)
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants
by: Galatolo, Alessio, et al.
Published: (2025)
by: Galatolo, Alessio, et al.
Published: (2025)
Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning
by: Mishra, Abhishek, et al.
Published: (2026)
by: Mishra, Abhishek, et al.
Published: (2026)
Can Brain Signals Reveal Inner Alignment with Human Languages?
by: Han, William, et al.
Published: (2022)
by: Han, William, et al.
Published: (2022)
Similar Items
-
The Missing Red Line: How Commercial Pressure Erodes AI Safety Boundaries
by: Petrova, Nora, et al.
Published: (2026) -
Evaluating AI Evaluation: Perils and Prospects
by: Burden, John
Published: (2024) -
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
by: Petrova, Nora, et al.
Published: (2026) -
I Spy With My Model's Eye: Visual Search as a Behavioural Test for MLLMs
by: Burden, John, et al.
Published: (2025) -
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
by: Burden, John, et al.
Published: (2025)