Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Abdullaev, Laziz U., Wong, Noelle Y. L., Lee, Ryan T. Z., Jiang, Shiqi, Nguyen, Khoi N. M., Nguyen, Tan M.
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2603.02237
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866914620083535872
author	Abdullaev, Laziz U. Wong, Noelle Y. L. Lee, Ryan T. Z. Jiang, Shiqi Nguyen, Khoi N. M. Nguyen, Tan M.
author_facet	Abdullaev, Laziz U. Wong, Noelle Y. L. Lee, Ryan T. Z. Jiang, Shiqi Nguyen, Khoi N. M. Nguyen, Tan M.
contents	Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained via difference-in-means over contrastive datasets. This approach implicitly assumes that the target concept is homogeneously represented across the embedding space. In practice, however, LLM representations can be highly non-homogeneous, exhibiting clustered, context-dependent structure, which renders global steering directions brittle. In this work, we view representation steering through the lens of optimal transport (OT), noting that standard difference-in-means steering implicitly corresponds to the OT map between two identical distributions with differing first moments, yielding a global translation. To relax this restrictive assumption, we theoretically model source and target representations as Gaussian mixture models and formulate steering as a discrete OT problem between semantic latent clusters. From the resulting transport plan, we derive an explicit, input-dependent steering map via barycentric projection, producing a smooth, kernel-weighted combination of cluster-level shifts. We term this method Concept Heterogeneity-aware Representation Steering (CHaRS). Through numerous experimental settings, we show that CHaRS yields more effective behavioral control than global steering.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_02237
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Concept Heterogeneity-aware Representation Steering Abdullaev, Laziz U. Wong, Noelle Y. L. Lee, Ryan T. Z. Jiang, Shiqi Nguyen, Khoi N. M. Nguyen, Tan M. Machine Learning Artificial Intelligence Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained via difference-in-means over contrastive datasets. This approach implicitly assumes that the target concept is homogeneously represented across the embedding space. In practice, however, LLM representations can be highly non-homogeneous, exhibiting clustered, context-dependent structure, which renders global steering directions brittle. In this work, we view representation steering through the lens of optimal transport (OT), noting that standard difference-in-means steering implicitly corresponds to the OT map between two identical distributions with differing first moments, yielding a global translation. To relax this restrictive assumption, we theoretically model source and target representations as Gaussian mixture models and formulate steering as a discrete OT problem between semantic latent clusters. From the resulting transport plan, we derive an explicit, input-dependent steering map via barycentric projection, producing a smooth, kernel-weighted combination of cluster-level shifts. We term this method Concept Heterogeneity-aware Representation Steering (CHaRS). Through numerous experimental settings, we show that CHaRS yields more effective behavioral control than global steering.
title	Concept Heterogeneity-aware Representation Steering
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2603.02237

Documents similaires