Saved in:
| Main Authors: | Góral, Gracjan, Winkels, Marysia, Basart, Steven |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.07667 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2024)
by: Góral, Gracjan, et al.
Published: (2024)
What Matters in Hierarchical Search for Combinatorial Reasoning Problems?
by: Zawalski, Michał, et al.
Published: (2024)
by: Zawalski, Michał, et al.
Published: (2024)
Steering Language Models With Activation Engineering
by: Turner, Alexander Matt, et al.
Published: (2023)
by: Turner, Alexander Matt, et al.
Published: (2023)
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
by: Postmus, Joris, et al.
Published: (2024)
by: Postmus, Joris, et al.
Published: (2024)
Endogenous Resistance to Activation Steering in Language Models
by: McKenzie, Alex, et al.
Published: (2026)
by: McKenzie, Alex, et al.
Published: (2026)
Steering Large Language Model Activations in Sparse Spaces
by: Bayat, Reza, et al.
Published: (2025)
by: Bayat, Reza, et al.
Published: (2025)
Spherical Steering: Geometry-Aware Activation Rotation for Language Models
by: You, Zejia, et al.
Published: (2026)
by: You, Zejia, et al.
Published: (2026)
Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity
by: Meek, Austin, et al.
Published: (2025)
by: Meek, Austin, et al.
Published: (2025)
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options
by: Góral, Gracjan, et al.
Published: (2024)
by: Góral, Gracjan, et al.
Published: (2024)
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
by: Sivakumar, Anushka, et al.
Published: (2025)
by: Sivakumar, Anushka, et al.
Published: (2025)
Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)
by: Yang, Jiaxi, et al.
Published: (2026)
Steering Code LLMs with Activation Directions for Language and Library Control
by: Rahman, Md Mahbubur, et al.
Published: (2026)
by: Rahman, Md Mahbubur, et al.
Published: (2026)
Improving Instruction-Following in Language Models through Activation Steering
by: Stolfo, Alessandro, et al.
Published: (2024)
by: Stolfo, Alessandro, et al.
Published: (2024)
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2025)
by: Góral, Gracjan, et al.
Published: (2025)
Interpretable Steering of Large Language Models with Feature Guided Activation Additions
by: Soo, Samuel, et al.
Published: (2025)
by: Soo, Samuel, et al.
Published: (2025)
Multi-property Steering of Large Language Models with Dynamic Activation Composition
by: Scalena, Daniel, et al.
Published: (2024)
by: Scalena, Daniel, et al.
Published: (2024)
ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models
by: Anand, Nikhil, et al.
Published: (2026)
by: Anand, Nikhil, et al.
Published: (2026)
Activation Steering with a Feedback Controller
by: Nguyen, Dung V., et al.
Published: (2025)
by: Nguyen, Dung V., et al.
Published: (2025)
HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
Dynamically Scaled Activation Steering
by: Ferrando, Alex, et al.
Published: (2025)
by: Ferrando, Alex, et al.
Published: (2025)
CBMAS: Cognitive Behavioral Modeling via Activation Steering
by: Ismail, Ahmed H., et al.
Published: (2026)
by: Ismail, Ahmed H., et al.
Published: (2026)
Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models
by: Raza, Ali, et al.
Published: (2026)
by: Raza, Ali, et al.
Published: (2026)
Steer Like the LLM: Activation Steering that Mimics Prompting
by: Heyman, Geert, et al.
Published: (2026)
by: Heyman, Geert, et al.
Published: (2026)
TADA! Tuning Audio Diffusion Models through Activation Steering
by: Staniszewski, Łukasz, et al.
Published: (2026)
by: Staniszewski, Łukasz, et al.
Published: (2026)
Interpreting and Steering State-Space Models via Activation Subspace Bottlenecks
by: Mohan, Vamshi Sunku, et al.
Published: (2026)
by: Mohan, Vamshi Sunku, et al.
Published: (2026)
Steering Protein Language Models
by: Huang, Long-Kai, et al.
Published: (2025)
by: Huang, Long-Kai, et al.
Published: (2025)
MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering
by: Ding, Chenlu, et al.
Published: (2025)
by: Ding, Chenlu, et al.
Published: (2025)
Activation Steering for Chain-of-Thought Compression
by: Azizi, Seyedarmin, et al.
Published: (2025)
by: Azizi, Seyedarmin, et al.
Published: (2025)
Minimizing Collateral Damage in Activation Steering
by: Nguyen, Tam, et al.
Published: (2026)
by: Nguyen, Tam, et al.
Published: (2026)
Steered LLM Activations are Non-Surjective
by: Mishra, Aayush, et al.
Published: (2026)
by: Mishra, Aayush, et al.
Published: (2026)
Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns
by: Peng, Yameng, et al.
Published: (2026)
by: Peng, Yameng, et al.
Published: (2026)
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
by: Hedström, Anna, et al.
Published: (2025)
by: Hedström, Anna, et al.
Published: (2025)
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
by: Filippova, Anastasiia, et al.
Published: (2026)
by: Filippova, Anastasiia, et al.
Published: (2026)
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
by: Jin, Zehao, et al.
Published: (2026)
by: Jin, Zehao, et al.
Published: (2026)
A Honest Cross-Validation Estimator for Prediction Performance
by: Pan, Tianyu, et al.
Published: (2025)
by: Pan, Tianyu, et al.
Published: (2025)
Honest Lying: Understanding Memory Confabulation in Reflexive Agents
by: Dixit, Prakhar, et al.
Published: (2026)
by: Dixit, Prakhar, et al.
Published: (2026)
Compositional Steering of Large Language Models with Steering Tokens
by: Radevski, Gorjan, et al.
Published: (2026)
by: Radevski, Gorjan, et al.
Published: (2026)
Steering Language Model Refusal with Sparse Autoencoders
by: O'Brien, Kyle, et al.
Published: (2024)
by: O'Brien, Kyle, et al.
Published: (2024)
Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
by: Eisenstein, Jacob, et al.
Published: (2022)
by: Eisenstein, Jacob, et al.
Published: (2022)
Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
by: Jiang, Xinyan, et al.
Published: (2026)
by: Jiang, Xinyan, et al.
Published: (2026)
Similar Items
-
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2024) -
What Matters in Hierarchical Search for Combinatorial Reasoning Problems?
by: Zawalski, Michał, et al.
Published: (2024) -
Steering Language Models With Activation Engineering
by: Turner, Alexander Matt, et al.
Published: (2023) -
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
by: Postmus, Joris, et al.
Published: (2024) -
Endogenous Resistance to Activation Steering in Language Models
by: McKenzie, Alex, et al.
Published: (2026)