:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chia, Xin Wei, Wong, Swee Liang, Pan, Jonathan
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.18085
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States
by: Chia, Xin Wei, et al.
Published: (2025)

Enhancing Reasoning Capacity of SLM using Cognitive Enhancement
by: Pan, Jonathan, et al.
Published: (2024)

Prompt Inject Detection with Generative Explanation as an Investigative Tool
by: Pan, Jonathan, et al.
Published: (2025)

Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
by: Jiang, Xinyan, et al.
Published: (2025)

Exploring the Personality Traits of LLMs through Latent Features Steering
by: Yang, Shu, et al.
Published: (2024)

Interactive Debugging and Steering of Multi-Agent AI Systems
by: Epperson, Will, et al.
Published: (2025)

Mitigating Misalignment Contagion by Steering with Implicit Traits
by: Chang, Maria, et al.
Published: (2026)

Exploitation Without Deception: Dark Triad Feature Steering Reveals Separable Antisocial Circuits in Language Models
by: Berg, Cameron, et al.
Published: (2026)

Inference-Time Policy Steering through Human Interactions
by: Wang, Yanwei, et al.
Published: (2024)

BarrierSteer: LLM Safety via Learning Barrier Steering
by: Tran, Thanh Q., et al.
Published: (2026)

Human-Centered Human-AI Interaction (HC-HAII): A Human-Centered AI Perspective
by: Xu, Wei
Published: (2025)

OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation
by: Wu, Yichen, et al.
Published: (2025)

Human-AI Interaction Design Standards
by: Zhao, Chaoyi, et al.
Published: (2025)

Mixture-of-Subspaces in Low-Rank Adaptation
by: Wu, Taiqiang, et al.
Published: (2024)

What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions
by: Habibi, Reza, et al.
Published: (2025)

CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models
by: Wang, Xintong, et al.
Published: (2024)

The Dark Side of Digital Twins: Adversarial Attacks on AI-Driven Water Forecasting
by: Homaei, Mohammadhossein, et al.
Published: (2025)

PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
by: Yu, Manjiang, et al.
Published: (2025)

Revealed Multi-Objective Utility Aggregation in Human Driving
by: Sarkar, Atrisha, et al.
Published: (2023)

On the Same Page: Dimensions of Perceived Shared Understanding in Human-AI Interaction
by: Liang, Qingyu, et al.
Published: (2025)

Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition
by: Kazemitabaar, Majeed, et al.
Published: (2024)

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration
by: Woolley, Sandra, et al.
Published: (2026)

Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning
by: Wan, Zhiyi, et al.
Published: (2025)

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education
by: Pan, Wei Hung, et al.
Published: (2024)

Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming
by: Sheng, Wei, et al.
Published: (2026)

The Dark Side of AI Transformers: Sentiment Polarization & the Loss of Business Neutrality by NLP Transformers
by: Kumar, Prasanna
Published: (2026)

The Cognitive Circuit Breaker: A Systems Engineering Framework for Intrinsic AI Reliability
by: Pan, Jonathan
Published: (2026)

Model Merging in the Essential Subspace
by: Li, Longhua, et al.
Published: (2026)

Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement
by: Islam, Sekh Mainul, et al.
Published: (2025)

CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
by: Lv, Hang, et al.
Published: (2025)

Revealing the Power of Masked Autoencoders in Traffic Forecasting
by: Sun, Jiarui, et al.
Published: (2023)

The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
by: Chen, Bocheng, et al.
Published: (2024)

Exploring the Impact of Personality Traits on LLM Bias and Toxicity
by: Wang, Shuo, et al.
Published: (2025)

Harmful Traits of AI Companions
by: Knox, W. Bradley, et al.
Published: (2025)

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs
by: Chu, SeongYeub, et al.
Published: (2024)

AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations
by: Li, Jingshu, et al.
Published: (2026)

Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)

Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents
by: He, Muyu, et al.
Published: (2025)

Human-Centered Human-AI Collaboration (HCHAC)
by: Gao, Qi, et al.
Published: (2025)

Explainable Human-AI Interaction: A Planning Perspective
by: Sreedharan, Sarath, et al.
Published: (2024)