:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Dawn, Rimsky, Nina
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2402.00402
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Steering Llama 2 via Contrastive Activation Addition
by: Panickssery, Nina, et al.
Published: (2023)

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
by: Fathullah, Yassir, et al.
Published: (2023)

SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)

Steering Awareness: Detecting Activation Steering from Within
by: Rivera, Joshua Fonseca, et al.
Published: (2025)

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions
by: Kang, Diancheng, et al.
Published: (2026)

Forbidden Facts: An Investigation of Competing Objectives in Llama-2
by: Wang, Tony T., et al.
Published: (2023)

HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)

TinyLlama: An Open-Source Small Language Model
by: Zhang, Peiyuan, et al.
Published: (2024)

MGH Radiology Llama: A Llama 3 70B Model for Radiology
by: Shi, Yucheng, et al.
Published: (2024)

Steering Towards Fairness: Mitigating Political Bias in LLMs
by: Nadeem, Afrozah, et al.
Published: (2025)

Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
by: Ackerman, Christopher, et al.
Published: (2024)

Steer Like the LLM: Activation Steering that Mimics Prompting
by: Heyman, Geert, et al.
Published: (2026)

Activation Scaling for Steering and Interpreting Language Models
by: Stoehr, Niklas, et al.
Published: (2024)

Fusion Steering: Prompt-Specific Activation Control
by: Chang, Waldemar, et al.
Published: (2025)

RepIt: Steering Language Models with Concept-Specific Refusal Vectors
by: Siu, Vincent, et al.
Published: (2025)

When Wording Steers the Evaluation: Framing Bias in LLM judges
by: Hwang, Yerin, et al.
Published: (2026)

Cross-Lingual Activation Steering for Multilingual Language Models
by: Pokharel, Rhitabrat, et al.
Published: (2026)

Bias Beyond Borders: Political Ideology Evaluation and Steering in Multilingual LLMs
by: Nadeem, Afrozah, et al.
Published: (2026)

Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)

SAKE: Steering Activations for Knowledge Editing
by: Scialanga, Marco, et al.
Published: (2025)

Hallucination reduction with CASAL: Contrastive Activation Steering For Amortized Learning
by: Wannan, et al.
Published: (2025)

DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
by: Li, Yu, et al.
Published: (2024)

Open Llama2 Model for the Lithuanian Language
by: Nakvosas, Artūras, et al.
Published: (2024)

Endogenous Resistance to Activation Steering in Language Models
by: McKenzie, Alex, et al.
Published: (2026)

Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes
by: Masoudian, Shahed, et al.
Published: (2025)

On Effects of Steering Latent Representation for Large Language Model Unlearning
by: Huu-Tien, Dang, et al.
Published: (2024)

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
by: Cheng, Stephen, et al.
Published: (2026)

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models
by: Guang, Jiahui, et al.
Published: (2026)

LinEAS: End-to-end Learning of Activation Steering with a Distributional Loss
by: Rodriguez, Pau, et al.
Published: (2025)

Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs
by: Siddique, Zara, et al.
Published: (2025)

CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
by: Lv, Hang, et al.
Published: (2025)

Extracting Unlearned Information from LLMs with Activation Steering
by: Seyitoğlu, Atakan, et al.
Published: (2024)

The Llama 3 Herd of Models
by: Grattafiori, Aaron, et al.
Published: (2024)

Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
by: Xu, Ziyang, et al.
Published: (2024)

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
by: Xu, Haolei, et al.
Published: (2025)

JetMoE: Reaching Llama2 Performance with 0.1M Dollars
by: Shen, Yikang, et al.
Published: (2024)

Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering
by: Valentino, Marco, et al.
Published: (2025)

Llama-Nemotron: Efficient Reasoning Models
by: Bercovich, Akhiad, et al.
Published: (2025)

Extending Activation Steering to Broad Skills and Multiple Behaviours
by: van der Weij, Teun, et al.
Published: (2024)

How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses
by: Urchs, Stefanie, et al.
Published: (2023)