Saved in:
Bibliographic Details
Main Authors: Siddique, Zara, Turner, Liam D., Espinosa-Anke, Luis
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.06262
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910981962072064
author Siddique, Zara
Turner, Liam D.
Espinosa-Anke, Luis
author_facet Siddique, Zara
Turner, Liam D.
Espinosa-Anke, Luis
contents We introduce Dialz, a framework for advancing research on steering vectors for open-source LLMs, implemented in Python. Steering vectors allow users to modify activations at inference time to amplify or weaken a 'concept', e.g. honesty or positivity, providing a more powerful alternative to prompting or fine-tuning. Dialz supports a diverse set of tasks, including creating contrastive pair datasets, computing and applying steering vectors, and visualizations. Unlike existing libraries, Dialz emphasizes modularity and usability, enabling both rapid prototyping and in-depth analysis. We demonstrate how Dialz can be used to reduce harmful outputs such as stereotypes, while also providing insights into model behaviour across different layers. We release Dialz with full documentation, tutorials, and support for popular open-source models to encourage further research in safe and controllable language generation. Dialz enables faster research cycles and facilitates insights into model interpretability, paving the way for safer, more transparent, and more reliable AI systems.
format Preprint
id arxiv_https___arxiv_org_abs_2505_06262
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Dialz: A Python Toolkit for Steering Vectors
Siddique, Zara
Turner, Liam D.
Espinosa-Anke, Luis
Machine Learning
Artificial Intelligence
We introduce Dialz, a framework for advancing research on steering vectors for open-source LLMs, implemented in Python. Steering vectors allow users to modify activations at inference time to amplify or weaken a 'concept', e.g. honesty or positivity, providing a more powerful alternative to prompting or fine-tuning. Dialz supports a diverse set of tasks, including creating contrastive pair datasets, computing and applying steering vectors, and visualizations. Unlike existing libraries, Dialz emphasizes modularity and usability, enabling both rapid prototyping and in-depth analysis. We demonstrate how Dialz can be used to reduce harmful outputs such as stereotypes, while also providing insights into model behaviour across different layers. We release Dialz with full documentation, tutorials, and support for popular open-source models to encourage further research in safe and controllable language generation. Dialz enables faster research cycles and facilitates insights into model interpretability, paving the way for safer, more transparent, and more reliable AI systems.
title Dialz: A Python Toolkit for Steering Vectors
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2505.06262