Saved in:
| Main Authors: | Wang, Sihan, Zhao, Jiayi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17231 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spherical Steering: Geometry-Aware Activation Rotation for Language Models
by: You, Zejia, et al.
Published: (2026)
by: You, Zejia, et al.
Published: (2026)
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
by: Jin, Zehao, et al.
Published: (2026)
by: Jin, Zehao, et al.
Published: (2026)
HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
Steering Language Models With Activation Engineering
by: Turner, Alexander Matt, et al.
Published: (2023)
by: Turner, Alexander Matt, et al.
Published: (2023)
Steer Like the LLM: Activation Steering that Mimics Prompting
by: Heyman, Geert, et al.
Published: (2026)
by: Heyman, Geert, et al.
Published: (2026)
The Information Geometry of Softmax: Probing and Steering
by: Park, Kiho, et al.
Published: (2026)
by: Park, Kiho, et al.
Published: (2026)
ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)
by: Su, Xuanbo, et al.
Published: (2026)
Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)
by: Lee, Bruce W., et al.
Published: (2024)
SAKE: Steering Activations for Knowledge Editing
by: Scialanga, Marco, et al.
Published: (2025)
by: Scialanga, Marco, et al.
Published: (2025)
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
by: Gu, Zhuohan, et al.
Published: (2024)
by: Gu, Zhuohan, et al.
Published: (2024)
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
by: Hegazy, Amr, et al.
Published: (2025)
by: Hegazy, Amr, et al.
Published: (2025)
Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)
by: Fayyaz, Mohsen, et al.
Published: (2025)
Understanding How CodeLLMs (Mis)Predict Types with Activation Steering
by: Lucchetti, Francesca, et al.
Published: (2024)
by: Lucchetti, Francesca, et al.
Published: (2024)
Endogenous Resistance to Activation Steering in Language Models
by: McKenzie, Alex, et al.
Published: (2026)
by: McKenzie, Alex, et al.
Published: (2026)
Activation Steering via Generative Causal Mediation
by: Sankaranarayanan, Aruna, et al.
Published: (2026)
by: Sankaranarayanan, Aruna, et al.
Published: (2026)
Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection
by: Deshpande, Vijeta, et al.
Published: (2026)
by: Deshpande, Vijeta, et al.
Published: (2026)
Extracting Unlearned Information from LLMs with Activation Steering
by: Seyitoğlu, Atakan, et al.
Published: (2024)
by: Seyitoğlu, Atakan, et al.
Published: (2024)
Steering Llama 2 via Contrastive Activation Addition
by: Panickssery, Nina, et al.
Published: (2023)
by: Panickssery, Nina, et al.
Published: (2023)
Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering
by: Bigelow, Eric, et al.
Published: (2025)
by: Bigelow, Eric, et al.
Published: (2025)
Extending Activation Steering to Broad Skills and Multiple Behaviours
by: van der Weij, Teun, et al.
Published: (2024)
by: van der Weij, Teun, et al.
Published: (2024)
Improving Instruction-Following in Language Models through Activation Steering
by: Stolfo, Alessandro, et al.
Published: (2024)
by: Stolfo, Alessandro, et al.
Published: (2024)
On the Geometry of Positional Encodings in Transformers
by: Cirrincione, Giansalvo
Published: (2026)
by: Cirrincione, Giansalvo
Published: (2026)
FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
by: Wang, Xinyi, et al.
Published: (2025)
by: Wang, Xinyi, et al.
Published: (2025)
SteerConf: Steering LLMs for Confidence Elicitation
by: Zhou, Ziang, et al.
Published: (2025)
by: Zhou, Ziang, et al.
Published: (2025)
ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
by: Anand, Nikhil, et al.
Published: (2026)
by: Anand, Nikhil, et al.
Published: (2026)
Multi-property Steering of Large Language Models with Dynamic Activation Composition
by: Scalena, Daniel, et al.
Published: (2024)
by: Scalena, Daniel, et al.
Published: (2024)
Interpretable Steering of Large Language Models with Feature Guided Activation Additions
by: Soo, Samuel, et al.
Published: (2025)
by: Soo, Samuel, et al.
Published: (2025)
Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
by: Wang, Jiayu, et al.
Published: (2026)
by: Wang, Jiayu, et al.
Published: (2026)
Conceptors for Semantic Steering
by: Triantafyllopoulos, Ilias, et al.
Published: (2026)
by: Triantafyllopoulos, Ilias, et al.
Published: (2026)
FGGM: Fisher-Guided Gradient Masking for Continual Learning
by: Tan, Chao-Hong, et al.
Published: (2026)
by: Tan, Chao-Hong, et al.
Published: (2026)
Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM
by: Solgi, Ryan, et al.
Published: (2025)
by: Solgi, Ryan, et al.
Published: (2025)
Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation
by: Zhao, Haiyan, et al.
Published: (2026)
by: Zhao, Haiyan, et al.
Published: (2026)
Towards Understanding Steering Strength
by: Taimeskhanov, Magamed, et al.
Published: (2026)
by: Taimeskhanov, Magamed, et al.
Published: (2026)
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)
by: Siu, Vincent, et al.
Published: (2025)
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
by: Batra, Shourya, et al.
Published: (2025)
by: Batra, Shourya, et al.
Published: (2025)
Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers
by: Yan, Hao, et al.
Published: (2026)
by: Yan, Hao, et al.
Published: (2026)
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
by: Cao, Yuanpu, et al.
Published: (2024)
by: Cao, Yuanpu, et al.
Published: (2024)
Compositional Steering of Large Language Models with Steering Tokens
by: Radevski, Gorjan, et al.
Published: (2026)
by: Radevski, Gorjan, et al.
Published: (2026)
Predicting Where Steering Vectors Succeed
by: Billa, Jayadev
Published: (2026)
by: Billa, Jayadev
Published: (2026)
Steering Language Models with Weight Arithmetic
by: Fierro, Constanza, et al.
Published: (2025)
by: Fierro, Constanza, et al.
Published: (2025)
Similar Items
-
Spherical Steering: Geometry-Aware Activation Rotation for Language Models
by: You, Zejia, et al.
Published: (2026) -
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
by: Jin, Zehao, et al.
Published: (2026) -
HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025) -
Steering Language Models With Activation Engineering
by: Turner, Alexander Matt, et al.
Published: (2023) -
Steer Like the LLM: Activation Steering that Mimics Prompting
by: Heyman, Geert, et al.
Published: (2026)