Saved in:
| Main Authors: | Chia, Xin Wei, Wong, Swee Liang, Pan, Jonathan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.18085 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States
by: Chia, Xin Wei, et al.
Published: (2025)
by: Chia, Xin Wei, et al.
Published: (2025)
Enhancing Reasoning Capacity of SLM using Cognitive Enhancement
by: Pan, Jonathan, et al.
Published: (2024)
by: Pan, Jonathan, et al.
Published: (2024)
Prompt Inject Detection with Generative Explanation as an Investigative Tool
by: Pan, Jonathan, et al.
Published: (2025)
by: Pan, Jonathan, et al.
Published: (2025)
Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
by: Jiang, Xinyan, et al.
Published: (2025)
by: Jiang, Xinyan, et al.
Published: (2025)
Exploring the Personality Traits of LLMs through Latent Features Steering
by: Yang, Shu, et al.
Published: (2024)
by: Yang, Shu, et al.
Published: (2024)
Interactive Debugging and Steering of Multi-Agent AI Systems
by: Epperson, Will, et al.
Published: (2025)
by: Epperson, Will, et al.
Published: (2025)
Mitigating Misalignment Contagion by Steering with Implicit Traits
by: Chang, Maria, et al.
Published: (2026)
by: Chang, Maria, et al.
Published: (2026)
Exploitation Without Deception: Dark Triad Feature Steering Reveals Separable Antisocial Circuits in Language Models
by: Berg, Cameron, et al.
Published: (2026)
by: Berg, Cameron, et al.
Published: (2026)
Inference-Time Policy Steering through Human Interactions
by: Wang, Yanwei, et al.
Published: (2024)
by: Wang, Yanwei, et al.
Published: (2024)
BarrierSteer: LLM Safety via Learning Barrier Steering
by: Tran, Thanh Q., et al.
Published: (2026)
by: Tran, Thanh Q., et al.
Published: (2026)
Human-Centered Human-AI Interaction (HC-HAII): A Human-Centered AI Perspective
by: Xu, Wei
Published: (2025)
by: Xu, Wei
Published: (2025)
OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation
by: Wu, Yichen, et al.
Published: (2025)
by: Wu, Yichen, et al.
Published: (2025)
Human-AI Interaction Design Standards
by: Zhao, Chaoyi, et al.
Published: (2025)
by: Zhao, Chaoyi, et al.
Published: (2025)
Mixture-of-Subspaces in Low-Rank Adaptation
by: Wu, Taiqiang, et al.
Published: (2024)
by: Wu, Taiqiang, et al.
Published: (2024)
What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions
by: Habibi, Reza, et al.
Published: (2025)
by: Habibi, Reza, et al.
Published: (2025)
CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models
by: Wang, Xintong, et al.
Published: (2024)
by: Wang, Xintong, et al.
Published: (2024)
The Dark Side of Digital Twins: Adversarial Attacks on AI-Driven Water Forecasting
by: Homaei, Mohammadhossein, et al.
Published: (2025)
by: Homaei, Mohammadhossein, et al.
Published: (2025)
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
by: Yu, Manjiang, et al.
Published: (2025)
by: Yu, Manjiang, et al.
Published: (2025)
Revealed Multi-Objective Utility Aggregation in Human Driving
by: Sarkar, Atrisha, et al.
Published: (2023)
by: Sarkar, Atrisha, et al.
Published: (2023)
On the Same Page: Dimensions of Perceived Shared Understanding in Human-AI Interaction
by: Liang, Qingyu, et al.
Published: (2025)
by: Liang, Qingyu, et al.
Published: (2025)
Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition
by: Kazemitabaar, Majeed, et al.
Published: (2024)
by: Kazemitabaar, Majeed, et al.
Published: (2024)
Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration
by: Woolley, Sandra, et al.
Published: (2026)
by: Woolley, Sandra, et al.
Published: (2026)
Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning
by: Wan, Zhiyi, et al.
Published: (2025)
by: Wan, Zhiyi, et al.
Published: (2025)
Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education
by: Pan, Wei Hung, et al.
Published: (2024)
by: Pan, Wei Hung, et al.
Published: (2024)
Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming
by: Sheng, Wei, et al.
Published: (2026)
by: Sheng, Wei, et al.
Published: (2026)
The Dark Side of AI Transformers: Sentiment Polarization & the Loss of Business Neutrality by NLP Transformers
by: Kumar, Prasanna
Published: (2026)
by: Kumar, Prasanna
Published: (2026)
The Cognitive Circuit Breaker: A Systems Engineering Framework for Intrinsic AI Reliability
by: Pan, Jonathan
Published: (2026)
by: Pan, Jonathan
Published: (2026)
Model Merging in the Essential Subspace
by: Li, Longhua, et al.
Published: (2026)
by: Li, Longhua, et al.
Published: (2026)
Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement
by: Islam, Sekh Mainul, et al.
Published: (2025)
by: Islam, Sekh Mainul, et al.
Published: (2025)
CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering
by: Lv, Hang, et al.
Published: (2025)
by: Lv, Hang, et al.
Published: (2025)
Revealing the Power of Masked Autoencoders in Traffic Forecasting
by: Sun, Jiarui, et al.
Published: (2023)
by: Sun, Jiarui, et al.
Published: (2023)
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
by: Chen, Bocheng, et al.
Published: (2024)
by: Chen, Bocheng, et al.
Published: (2024)
Exploring the Impact of Personality Traits on LLM Bias and Toxicity
by: Wang, Shuo, et al.
Published: (2025)
by: Wang, Shuo, et al.
Published: (2025)
Harmful Traits of AI Companions
by: Knox, W. Bradley, et al.
Published: (2025)
by: Knox, W. Bradley, et al.
Published: (2025)
Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs
by: Chu, SeongYeub, et al.
Published: (2024)
by: Chu, SeongYeub, et al.
Published: (2024)
AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations
by: Li, Jingshu, et al.
Published: (2026)
by: Li, Jingshu, et al.
Published: (2026)
Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)
by: Zhou, Enyu, et al.
Published: (2026)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents
by: He, Muyu, et al.
Published: (2025)
by: He, Muyu, et al.
Published: (2025)
Human-Centered Human-AI Collaboration (HCHAC)
by: Gao, Qi, et al.
Published: (2025)
by: Gao, Qi, et al.
Published: (2025)
Explainable Human-AI Interaction: A Planning Perspective
by: Sreedharan, Sarath, et al.
Published: (2024)
by: Sreedharan, Sarath, et al.
Published: (2024)
Similar Items
-
Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States
by: Chia, Xin Wei, et al.
Published: (2025) -
Enhancing Reasoning Capacity of SLM using Cognitive Enhancement
by: Pan, Jonathan, et al.
Published: (2024) -
Prompt Inject Detection with Generative Explanation as an Investigative Tool
by: Pan, Jonathan, et al.
Published: (2025) -
Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models
by: Jiang, Xinyan, et al.
Published: (2025) -
Exploring the Personality Traits of LLMs through Latent Features Steering
by: Yang, Shu, et al.
Published: (2024)