Saved in:
Bibliographic Details
Main Authors: Day, Huw, Jezierska, Adrianna, Woodgate, Jessica
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.01942
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910038025568256
author Day, Huw
Jezierska, Adrianna
Woodgate, Jessica
author_facet Day, Huw
Jezierska, Adrianna
Woodgate, Jessica
contents Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platform-led moderation as a countermeasure. In this paper, we propose a user-centric view of "jailbreaking" as an emergent, non-violent de-escalation practice. Online users engage with suspected LLM-powered accounts to circumvent large language model safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives.
format Preprint
id arxiv_https___arxiv_org_abs_2603_01942
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots
Day, Huw
Jezierska, Adrianna
Woodgate, Jessica
Human-Computer Interaction
Artificial Intelligence
Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platform-led moderation as a countermeasure. In this paper, we propose a user-centric view of "jailbreaking" as an emergent, non-violent de-escalation practice. Online users engage with suspected LLM-powered accounts to circumvent large language model safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives.
title Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots
topic Human-Computer Interaction
Artificial Intelligence
url https://arxiv.org/abs/2603.01942