:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mohan, Vamshi Sunku, Gupta, Kaustubh, Das, Aneesha, Singh, Chandan
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.22719
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Leveraging Fog Computing for Security‐Aware Resource Allocation in Narrowband Internet of Things
by: Vamshi Sunku Mohan, et al.
Published: (2024)

VALOR: Value-Aware Revenue Uplift Modeling with Treatment-Gated Representation for B2B Sales
by: Guduguntla, Vamshi, et al.
Published: (2026)

Bayesian Concept Bottleneck Models with LLM Priors
by: Feng, Jean, et al.
Published: (2024)

Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning
by: Vamshi, Bodla Krishna, et al.
Published: (2026)

Assessing the Operational Viability of Foundation Models for Time Series Forecasting
by: Soni, Kavin, et al.
Published: (2026)

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation
by: Prokopiou, Ioannis, et al.
Published: (2026)

Steering Large Language Model Activations in Sparse Spaces
by: Bayat, Reza, et al.
Published: (2025)

Protocode: Prototype-Driven Interpretability for Code Generation in LLMs
by: Bodla, Krishna Vamshi, et al.
Published: (2025)

Angular Steering: Behavior Control via Rotation in Activation Space
by: Vu, Hieu M., et al.
Published: (2025)

Steering Conceptual Bias via Transformer Latent-Subspace Activation
by: Sharma, Vansh, et al.
Published: (2025)

Interpretable Steering of Large Language Models with Feature Guided Activation Additions
by: Soo, Samuel, et al.
Published: (2025)

Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization
by: Huang, Yancheng, et al.
Published: (2026)

Interpretable Reward Modeling with Active Concept Bottlenecks
by: Laguna, Sonia, et al.
Published: (2025)

Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits
by: Khilar, Snigdha Chandan
Published: (2026)

Characterizing the Behavior of Training Mamba-based State Space Models on GPUs
by: Baruah, Trinayan, et al.
Published: (2025)

Interpretable Prognostics with Concept Bottleneck Models
by: Forest, Florent, et al.
Published: (2024)

Discovering and Steering Interpretable Concepts in Large Generative Music Models
by: Singh, Nikhil, et al.
Published: (2025)

D-STEER - Preference Alignment Techniques Learn to Behave, not to Believe -- Beneath the Surface, DPO as Steering Vector Perturbation in Activation Space
by: Raina, Samarth, et al.
Published: (2025)

CBMAS: Cognitive Behavioral Modeling via Activation Steering
by: Ismail, Ahmed H., et al.
Published: (2026)

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)

Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
by: Huang, Xinting, et al.
Published: (2025)

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models
by: Raza, Ali, et al.
Published: (2026)

When the Coffee Feature Activates on Coffins: An Analysis of Feature Extraction and Steering for Mechanistic Interpretability
by: Ronge, Raphael, et al.
Published: (2026)

GSS: Gated Subspace Steering for Selective Memorization Mitigation in LLMs
by: Zhang, Xuanqi, et al.
Published: (2026)

Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
by: Ponkshe, Kaustubh, et al.
Published: (2025)

Rethinking Interpretability in the Era of Large Language Models
by: Singh, Chandan, et al.
Published: (2024)

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
by: Wang, Peihao, et al.
Published: (2024)

Steering Language Models With Activation Engineering
by: Turner, Alexander Matt, et al.
Published: (2023)

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)

Interpretable Next-token Prediction via the Generalized Induction Head
by: Kim, Eunji, et al.
Published: (2024)

SafeSteer: Interpretable Safety Steering with Refusal-Evasion in LLMs
by: Ghosh, Shaona, et al.
Published: (2025)

Towards Reasonable Concept Bottleneck Models
by: Kalampalikis, Nektarios, et al.
Published: (2025)

Depth-Wise Activation Steering for Honest Language Models
by: Góral, Gracjan, et al.
Published: (2025)

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
by: Jiang, Xinyan, et al.
Published: (2026)

OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction
by: Hemadri, Raghu Vamshi, et al.
Published: (2025)

Activation Steering with a Feedback Controller
by: Nguyen, Dung V., et al.
Published: (2025)

HyperSteer: Activation Steering at Scale with Hypernetworks
by: Sun, Jiuding, et al.
Published: (2025)

Understanding In-context Learning of Addition via Activation Subspaces
by: Hu, Xinyan, et al.
Published: (2025)

Decoupled-Value Attention for Prior-Data Fitted Networks: GP Inference for Physical Equations
by: Sharma, Kaustubh, et al.
Published: (2025)

Dynamically Scaled Activation Steering
by: Ferrando, Alex, et al.
Published: (2025)