Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Pan, Jonathan
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Cryptography and Security
Online Access:	https://arxiv.org/abs/2601.12286
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914263114711040
author	Pan, Jonathan
author_facet	Pan, Jonathan
contents	The increasing prevalence of Large Language Models (LLMs) demands effective safeguards for their operation, particularly concerning their tendency to generate out-of-context responses. A key challenge is accurately detecting when LLMs stray from expected conversational norms, manifesting as topic shifts, factual inaccuracies, or outright hallucinations. Traditional anomaly detection struggles to directly apply within contextual semantics. This paper outlines our experiment in exploring the use of Representation Engineering (RepE) and One-Class Support Vector Machine (OCSVM) to identify subspaces within the internal states of LLMs that represent a specific context. By training OCSVM on in-context examples, we establish a robust boundary within the LLM's hidden state latent space. We evaluate out study with two open source LLMs - Llama and Qwen models in specific contextual domain. Our approach entailed identifying the optimal layers within the LLM's internal state subspaces that strongly associates with the context of interest. Our evaluation results showed promising results in identifying the subspace for a specific context. Aside from being useful in detecting in or out of context conversation threads, this research work contributes to the study of better interpreting LLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_12286
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Conversational Context Classification: A Representation Engineering Approach Pan, Jonathan Computation and Language Artificial Intelligence Cryptography and Security The increasing prevalence of Large Language Models (LLMs) demands effective safeguards for their operation, particularly concerning their tendency to generate out-of-context responses. A key challenge is accurately detecting when LLMs stray from expected conversational norms, manifesting as topic shifts, factual inaccuracies, or outright hallucinations. Traditional anomaly detection struggles to directly apply within contextual semantics. This paper outlines our experiment in exploring the use of Representation Engineering (RepE) and One-Class Support Vector Machine (OCSVM) to identify subspaces within the internal states of LLMs that represent a specific context. By training OCSVM on in-context examples, we establish a robust boundary within the LLM's hidden state latent space. We evaluate out study with two open source LLMs - Llama and Qwen models in specific contextual domain. Our approach entailed identifying the optimal layers within the LLM's internal state subspaces that strongly associates with the context of interest. Our evaluation results showed promising results in identifying the subspace for a specific context. Aside from being useful in detecting in or out of context conversation threads, this research work contributes to the study of better interpreting LLMs.
title	Conversational Context Classification: A Representation Engineering Approach
topic	Computation and Language Artificial Intelligence Cryptography and Security
url	https://arxiv.org/abs/2601.12286

Similar Items