Saved in:
| Main Authors: | Jiang, Xinyan, Zhang, Lin, Zhang, Jiayi, Yang, Qingsong, Hu, Guimin, Wang, Di, Hu, Lijie |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.10599 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
by: Jiang, Xinyan, et al.
Published: (2026)
by: Jiang, Xinyan, et al.
Published: (2026)
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
by: You, Liangliang, et al.
Published: (2025)
by: You, Liangliang, et al.
Published: (2025)
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
by: Yu, Manjiang, et al.
Published: (2025)
by: Yu, Manjiang, et al.
Published: (2025)
Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
by: Jiang, Xinyan, et al.
Published: (2026)
by: Jiang, Xinyan, et al.
Published: (2026)
Functional Subspace Watermarking for Large Language Models
by: Ding, Zikang, et al.
Published: (2026)
by: Ding, Zikang, et al.
Published: (2026)
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
Controlling Repetition in Protein Language Models
by: Zhang, Jiahao, et al.
Published: (2026)
by: Zhang, Jiahao, et al.
Published: (2026)
Understanding In-context Learning of Addition via Activation Subspaces
by: Hu, Xinyan, et al.
Published: (2025)
by: Hu, Xinyan, et al.
Published: (2025)
Partitioner Guided Modal Learning Framework
by: Hu, Guimin, et al.
Published: (2025)
by: Hu, Guimin, et al.
Published: (2025)
The Compositional Architecture of Regret in Large Language Models
by: Cui, Xiangxiang, et al.
Published: (2025)
by: Cui, Xiangxiang, et al.
Published: (2025)
FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering
by: Ding, Zikang, et al.
Published: (2026)
by: Ding, Zikang, et al.
Published: (2026)
Exploring the Personality Traits of LLMs through Latent Features Steering
by: Yang, Shu, et al.
Published: (2024)
by: Yang, Shu, et al.
Published: (2024)
Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images
by: Yang, Qishun, et al.
Published: (2026)
by: Yang, Qishun, et al.
Published: (2026)
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
by: Yu, Xiaomin, et al.
Published: (2026)
by: Yu, Xiaomin, et al.
Published: (2026)
Towards Multi-dimensional Explanation Alignment for Medical Classification
by: Hu, Lijie, et al.
Published: (2024)
by: Hu, Lijie, et al.
Published: (2024)
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
by: Zhou, Andy
Published: (2025)
by: Zhou, Andy
Published: (2025)
SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents
by: Li, Dawei, et al.
Published: (2024)
by: Li, Dawei, et al.
Published: (2024)
Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment
by: Xie, Huanyi, et al.
Published: (2025)
by: Xie, Huanyi, et al.
Published: (2025)
Predicting LLM Output Length via Entropy-Guided Representations
by: Xie, Huanyi, et al.
Published: (2026)
by: Xie, Huanyi, et al.
Published: (2026)
Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering
by: Li, Xiaomin, et al.
Published: (2026)
by: Li, Xiaomin, et al.
Published: (2026)
Private Language Models via Truncated Laplacian Mechanism
by: Huang, Tianhao, et al.
Published: (2024)
by: Huang, Tianhao, et al.
Published: (2024)
UGID: Unified Graph Isomorphism for Debiasing Large Language Models
by: Ding, Zikang, et al.
Published: (2026)
by: Ding, Zikang, et al.
Published: (2026)
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
by: Chuang, Yung-Sung, et al.
Published: (2025)
by: Chuang, Yung-Sung, et al.
Published: (2025)
In-Run Data Shapley for Adam Optimizer
by: Ding, Meng, et al.
Published: (2026)
by: Ding, Meng, et al.
Published: (2026)
Improving Attributed Text Generation of Large Language Models via Preference Learning
by: Li, Dongfang, et al.
Published: (2024)
by: Li, Dongfang, et al.
Published: (2024)
Multi-Attribute Steering of Language Models via Targeted Intervention
by: Nguyen, Duy, et al.
Published: (2025)
by: Nguyen, Duy, et al.
Published: (2025)
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
by: Xu, Yuemei, et al.
Published: (2024)
by: Xu, Yuemei, et al.
Published: (2024)
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
by: Yang, Shu, et al.
Published: (2024)
by: Yang, Shu, et al.
Published: (2024)
Towards Reasoning-Preserving Unlearning in Multimodal Large Language Models
by: Li, Hongji, et al.
Published: (2025)
by: Li, Hongji, et al.
Published: (2025)
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression
by: Ali, Muhammad Asif, et al.
Published: (2024)
by: Ali, Muhammad Asif, et al.
Published: (2024)
Understanding the Dynamics of Demonstration Conflict in In-Context Learning
by: Jiao, Difan, et al.
Published: (2026)
by: Jiao, Difan, et al.
Published: (2026)
Multi-Adapter Representation Interventions via Energy Calibration
by: Yu, Manjiang, et al.
Published: (2026)
by: Yu, Manjiang, et al.
Published: (2026)
CODEMENV: Benchmarking Large Language Models on Code Migration
by: Cheng, Keyuan, et al.
Published: (2025)
by: Cheng, Keyuan, et al.
Published: (2025)
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
by: Shi, Yang, et al.
Published: (2025)
by: Shi, Yang, et al.
Published: (2025)
LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion
by: Zhou, Guanghao, et al.
Published: (2026)
by: Zhou, Guanghao, et al.
Published: (2026)
On the Limitations of Steering in Language Model Alignment
by: Niranjan, Chebrolu, et al.
Published: (2025)
by: Niranjan, Chebrolu, et al.
Published: (2025)
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization
by: Wang, Xinhai, et al.
Published: (2025)
by: Wang, Xinhai, et al.
Published: (2025)
Steering When Necessary: Flexible Steering Large Language Models with Backtracking
by: Cheng, Zifeng, et al.
Published: (2025)
by: Cheng, Zifeng, et al.
Published: (2025)
An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
by: Oozeer, Narmeen, et al.
Published: (2025)
by: Oozeer, Narmeen, et al.
Published: (2025)
Similar Items
-
Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency
by: Jiang, Xinyan, et al.
Published: (2026) -
Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images
by: You, Liangliang, et al.
Published: (2025) -
PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
by: Yu, Manjiang, et al.
Published: (2025) -
Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
by: Jiang, Xinyan, et al.
Published: (2026) -
Functional Subspace Watermarking for Large Language Models
by: Ding, Zikang, et al.
Published: (2026)