Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Delannoy, Lorenzo, Delannoy, Niels
Format:	Recurso digital
Language:	English
Published:	Zenodo 2026
Subjects:	Large Language Models AI Safety AI Alignment Behavioral Psychology LLM Behavior Model Evaluation AI Ethics Psychological Profiling Chroma Method Fine-tuning RLHF
Online Access:	https://doi.org/10.5281/zenodo.18732152
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

This framework proposes a four-layer model to explain the behavioral patterns of Large Language Models (LLMs) as socio-psychological artifacts rather than purely technical systems. The model identifies four stacked layers of human influence: 1. Layer One - Data: Cultural and ideological background embedded in the training corpus 2. Layer Two - Teams: Psychology, stress patterns, and worldview of the humans who build and fine-tune the models 3. Layer Three - Alignment: Explicit safety rules, policies, and editorial filters imposed on model outputs 4. Layer Four - Model Behavior: The emergent "personality"—observable style, biases, and refusal patterns Through empirical case studies applying psychological profiling methodologies to LLM interactions, we demonstrate how these layers produce distinct behavioral signatures including stress response patterns, systematic political and moral asymmetries, linguistic bypass mechanisms (e.g., the "ENIGMA Protocol"), and variations in pathologization and therapeutic framing. The framework draws on existing benchmarks (political positioning tasks, social deduction games like Werewolf) showing that modern LLMs exhibit stable, model-specific behavioral profiles that cannot be explained by capability differences alone. Keywords: Large Language Models, Model Evaluation, RLHF, Al Safety, Al Ethics, Behavioral Psychology, Psychological Profiling, Al Alignment, Chroma Method, LLM Behavior, Fine-tuning

Similar Items