:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shu, Huizhen, Li, Xuying, Li, Zhuo
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.19839
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025)

LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries
by: Ren, Xuancheng, et al.
Published: (2026)

Latent-space Attacks for Refusal Evasion in Language Models
by: Piras, Giorgio, et al.
Published: (2026)

Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
by: Hu, Xulin, et al.
Published: (2026)

From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions
by: Alagharu, Rishab, et al.
Published: (2026)

Exploring the Personality Traits of LLMs through Latent Features Steering
by: Yang, Shu, et al.
Published: (2024)

Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation
by: Li, Xuying, et al.
Published: (2024)

Unveiling and Steering Connectome Organization with Interpretable Latent Variables
by: Li, Yubin, et al.
Published: (2025)

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
by: García-Ferrero, Iker, et al.
Published: (2025)

Steer LLM Latents for Hallucination Detection
by: Park, Seongheon, et al.
Published: (2025)

Latent Guard: a Safety Framework for Text-to-image Generation
by: Liu, Runtao, et al.
Published: (2024)

Programming Refusal with Conditional Activation Steering
by: Lee, Bruce W., et al.
Published: (2024)

Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors
by: LI, Xuying
Published: (2025)

Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
by: Li, Xuying, et al.
Published: (2025)

Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
by: Bhargav, Samaksh, et al.
Published: (2025)

RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering
by: Ye, Wencheng, et al.
Published: (2026)

AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
by: Sheng, Leheng, et al.
Published: (2025)

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation
by: Prokopiou, Ioannis, et al.
Published: (2026)

Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots
by: Zheng, Ziang, et al.
Published: (2025)

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism
by: Cao, Lang
Published: (2023)

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
by: Cheng, Stephen, et al.
Published: (2026)

Spatial-Aware Latent Initialization for Controllable Image Generation
by: Sun, Wenqiang, et al.
Published: (2024)

Preemptive Detection and Steering of LLM Misalignment via Latent Reachability
by: Karnik, Sathwik, et al.
Published: (2025)

On Effects of Steering Latent Representation for Large Language Model Unlearning
by: Huu-Tien, Dang, et al.
Published: (2024)

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs
by: Li, Jiakang, et al.
Published: (2026)

RepIt: Steering Language Models with Concept-Specific Refusal Vectors
by: Siu, Vincent, et al.
Published: (2025)

Latent Action Control for Reasoning-Guided Unified Image Generation
by: Zhai, Fuxiang, et al.
Published: (2026)

Controllable and Stealthy Shilling Attacks via Dispersive Latent Diffusion
by: Qiao, Shutong, et al.
Published: (2025)

Precision Knowledge Editing: Enhancing Safety in Large Language Models
by: Li, Xuying, et al.
Published: (2024)

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models
by: Wang, Yiqi, et al.
Published: (2025)

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal
by: Yang, Kia-Jüng, et al.
Published: (2026)

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
by: Egbuna, Nathan, et al.
Published: (2025)

Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
by: Liu, Andy Zeyi, et al.
Published: (2026)

Learning Latent Dynamic Robust Representations for World Models
by: Sun, Ruixiang, et al.
Published: (2024)

FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding
by: Yang, Jinghan, et al.
Published: (2026)

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
by: Liu, Sheng, et al.
Published: (2023)

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
by: Zhang, Wentao, et al.
Published: (2026)

Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
by: Qiu, Kai, et al.
Published: (2025)