Saved in:
| Main Authors: | Mohan, Vamshi Sunku, Gupta, Kaustubh, Das, Aneesha, Singh, Chandan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22719 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Leveraging Fog Computing for Security‐Aware Resource Allocation in Narrowband Internet of Things
by: Vamshi Sunku Mohan, et al.
Published: (2024)
by: Vamshi Sunku Mohan, et al.
Published: (2024)
VALOR: Value-Aware Revenue Uplift Modeling with Treatment-Gated Representation for B2B Sales
by: Guduguntla, Vamshi, et al.
Published: (2026)
by: Guduguntla, Vamshi, et al.
Published: (2026)
Bayesian Concept Bottleneck Models with LLM Priors
by: Feng, Jean, et al.
Published: (2024)
by: Feng, Jean, et al.
Published: (2024)
Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning
by: Vamshi, Bodla Krishna, et al.
Published: (2026)
by: Vamshi, Bodla Krishna, et al.
Published: (2026)
Assessing the Operational Viability of Foundation Models for Time Series Forecasting
by: Soni, Kavin, et al.
Published: (2026)
by: Soni, Kavin, et al.
Published: (2026)
Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation
by: Prokopiou, Ioannis, et al.
Published: (2026)
by: Prokopiou, Ioannis, et al.
Published: (2026)
Steering Large Language Model Activations in Sparse Spaces
by: Bayat, Reza, et al.
Published: (2025)
by: Bayat, Reza, et al.
Published: (2025)
Protocode: Prototype-Driven Interpretability for Code Generation in LLMs
by: Bodla, Krishna Vamshi, et al.
Published: (2025)
by: Bodla, Krishna Vamshi, et al.
Published: (2025)
Angular Steering: Behavior Control via Rotation in Activation Space
by: Vu, Hieu M., et al.
Published: (2025)
by: Vu, Hieu M., et al.
Published: (2025)
Steering Conceptual Bias via Transformer Latent-Subspace Activation
by: Sharma, Vansh, et al.
Published: (2025)
by: Sharma, Vansh, et al.
Published: (2025)
Interpretable Steering of Large Language Models with Feature Guided Activation Additions
by: Soo, Samuel, et al.
Published: (2025)
by: Soo, Samuel, et al.
Published: (2025)
Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization
by: Huang, Yancheng, et al.
Published: (2026)
by: Huang, Yancheng, et al.
Published: (2026)
Interpretable Reward Modeling with Active Concept Bottlenecks
by: Laguna, Sonia, et al.
Published: (2025)
by: Laguna, Sonia, et al.
Published: (2025)
Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits
by: Khilar, Snigdha Chandan
Published: (2026)
by: Khilar, Snigdha Chandan
Published: (2026)
Characterizing the Behavior of Training Mamba-based State Space Models on GPUs
by: Baruah, Trinayan, et al.
Published: (2025)
by: Baruah, Trinayan, et al.
Published: (2025)
Interpretable Prognostics with Concept Bottleneck Models
by: Forest, Florent, et al.
Published: (2024)
by: Forest, Florent, et al.
Published: (2024)
Discovering and Steering Interpretable Concepts in Large Generative Music Models
by: Singh, Nikhil, et al.
Published: (2025)
by: Singh, Nikhil, et al.
Published: (2025)
D-STEER - Preference Alignment Techniques Learn to Behave, not to Believe -- Beneath the Surface, DPO as Steering Vector Perturbation in Activation Space
by: Raina, Samarth, et al.
Published: (2025)
by: Raina, Samarth, et al.
Published: (2025)
CBMAS: Cognitive Behavioral Modeling via Activation Steering
by: Ismail, Ahmed H., et al.
Published: (2026)
by: Ismail, Ahmed H., et al.
Published: (2026)
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)
by: Zhang, Qingru, et al.
Published: (2023)
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
by: Huang, Xinting, et al.
Published: (2025)
by: Huang, Xinting, et al.
Published: (2025)
Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models
by: Raza, Ali, et al.
Published: (2026)
by: Raza, Ali, et al.
Published: (2026)
When the Coffee Feature Activates on Coffins: An Analysis of Feature Extraction and Steering for Mechanistic Interpretability
by: Ronge, Raphael, et al.
Published: (2026)
by: Ronge, Raphael, et al.
Published: (2026)
GSS: Gated Subspace Steering for Selective Memorization Mitigation in LLMs
by: Zhang, Xuanqi, et al.
Published: (2026)
by: Zhang, Xuanqi, et al.
Published: (2026)
Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
by: Ponkshe, Kaustubh, et al.
Published: (2025)
by: Ponkshe, Kaustubh, et al.
Published: (2025)
Rethinking Interpretability in the Era of Large Language Models
by: Singh, Chandan, et al.
Published: (2024)
by: Singh, Chandan, et al.
Published: (2024)
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
by: Wang, Peihao, et al.
Published: (2024)
by: Wang, Peihao, et al.
Published: (2024)
Steering Language Models With Activation Engineering
by: Turner, Alexander Matt, et al.
Published: (2023)
by: Turner, Alexander Matt, et al.
Published: (2023)
Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)
by: Yang, Jiaxi, et al.
Published: (2026)
Interpretable Next-token Prediction via the Generalized Induction Head
by: Kim, Eunji, et al.
Published: (2024)
by: Kim, Eunji, et al.
Published: (2024)
SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs
by: Ghosh, Shaona, et al.
Published: (2025)
by: Ghosh, Shaona, et al.
Published: (2025)
Towards Reasonable Concept Bottleneck Models
by: Kalampalikis, Nektarios, et al.
Published: (2025)
by: Kalampalikis, Nektarios, et al.
Published: (2025)
Depth-Wise Activation Steering for Honest Language Models
by: Góral, Gracjan, et al.
Published: (2025)
by: Góral, Gracjan, et al.
Published: (2025)
Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
by: Jiang, Xinyan, et al.
Published: (2026)
by: Jiang, Xinyan, et al.
Published: (2026)
OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction
by: Hemadri, Raghu Vamshi, et al.
Published: (2025)
by: Hemadri, Raghu Vamshi, et al.
Published: (2025)
Activation Steering with a Feedback Controller
by: Nguyen, Dung V., et al.
Published: (2025)
by: Nguyen, Dung V., et al.
Published: (2025)
HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)
by: Sun, Jiuding, et al.
Published: (2025)
Understanding In-context Learning of Addition via Activation Subspaces
by: Hu, Xinyan, et al.
Published: (2025)
by: Hu, Xinyan, et al.
Published: (2025)
Decoupled-Value Attention for Prior-Data Fitted Networks: GP Inference for Physical Equations
by: Sharma, Kaustubh, et al.
Published: (2025)
by: Sharma, Kaustubh, et al.
Published: (2025)
Dynamically Scaled Activation Steering
by: Ferrando, Alex, et al.
Published: (2025)
by: Ferrando, Alex, et al.
Published: (2025)
Similar Items
-
Leveraging Fog Computing for Security‐Aware Resource Allocation in Narrowband Internet of Things
by: Vamshi Sunku Mohan, et al.
Published: (2024) -
VALOR: Value-Aware Revenue Uplift Modeling with Treatment-Gated Representation for B2B Sales
by: Guduguntla, Vamshi, et al.
Published: (2026) -
Bayesian Concept Bottleneck Models with LLM Priors
by: Feng, Jean, et al.
Published: (2024) -
Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning
by: Vamshi, Bodla Krishna, et al.
Published: (2026) -
Assessing the Operational Viability of Foundation Models for Time Series Forecasting
by: Soni, Kavin, et al.
Published: (2026)