:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Ruihan, Sun, Jun
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.03401
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
by: Li, Xinzhe, et al.
Published: (2023)

The Unlearnability Phenomenon in RLVR for Language Models
by: Chen, Yulin, et al.
Published: (2026)

Towards Provably Unlearnable Examples via Bayes Error Optimisation
by: Zhang, Ruihan, et al.
Published: (2025)

Efficient Knowledge Infusion via KG-LLM Alignment
by: Jiang, Zhouyu, et al.
Published: (2024)

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
by: Zhang, Yihao, et al.
Published: (2026)

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
by: Cao, Maosong, et al.
Published: (2025)

Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment
by: Dou, Chengfeng, et al.
Published: (2024)

Exploiting Pseudo Image Captions for Multimodal Summarization
by: Jiang, Chaoya, et al.
Published: (2023)

RM-Distiller: Exploiting Generative LLM for Reward Model Distillation
by: Zhou, Hongli, et al.
Published: (2026)

ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment
by: Luo, Yangyifei, et al.
Published: (2024)

Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data
by: Chen, Zhuowei, et al.
Published: (2025)

Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought
by: Si, Jianfeng, et al.
Published: (2026)

LLM-as-a-Judge for Privacy Evaluation? Exploring the Alignment of Human and LLM Perceptions of Privacy in Textual Data
by: Meisenbacher, Stephen, et al.
Published: (2025)

PMMT: Preference Alignment in Multilingual Machine Translation via LLM Distillation
by: Sun, Shuqiao, et al.
Published: (2024)

REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective
by: Xu, Zhihao, et al.
Published: (2025)

Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
by: Sun, Guanglong, et al.
Published: (2026)

Evaluating LLM Alignment on Personality Inference from Real-World Interview Data
by: Zhu, Jianfeng, et al.
Published: (2025)

Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)

Understanding Layer Significance in LLM Alignment
by: Shi, Guangyuan, et al.
Published: (2024)

Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception
by: Ni, Shiyu, et al.
Published: (2025)

Can We Infer Confidential Properties of Training Data from LLMs?
by: Huang, Pengrun, et al.
Published: (2025)

Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding
by: Bao, Jianzhu, et al.
Published: (2026)

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
by: Yang, Ruihan, et al.
Published: (2025)

Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
by: Sun, Chenkai, et al.
Published: (2025)

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
by: Li, Yifan, et al.
Published: (2024)

MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment
by: Qin, Weicong, et al.
Published: (2025)

Societal Alignment Frameworks Can Improve LLM Alignment
by: Stańczak, Karolina, et al.
Published: (2025)

Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework
by: Yao, Xintong
Published: (2026)

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era
by: Wu, Xuansheng, et al.
Published: (2024)

CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
by: Bethany, Mazal, et al.
Published: (2025)

Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment
by: Zhang, Yizhuo, et al.
Published: (2025)

Leveraging Robust Optimization for LLM Alignment under Distribution Shifts
by: Zhu, Mingye, et al.
Published: (2025)

SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
by: Huang, Yue, et al.
Published: (2025)

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment
by: Cheng, Zehua, et al.
Published: (2026)

StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
by: Khan, Ishmam, et al.
Published: (2026)

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
by: Zhou, Enyu, et al.
Published: (2024)

PAD: Towards Efficient Data Generation for Transfer Learning Using Phrase Alignment
by: Kim, Jong Myoung, et al.
Published: (2025)

SaRO: Enhancing LLM Safety through Reasoning-based Alignment
by: Mou, Yutao, et al.
Published: (2025)

Data-efficient LLM Fine-tuning for Code Generation
by: Lv, Weijie, et al.
Published: (2025)

MICA: Multi-granularity Intertemporal Credit Assignment for Long-Horizon Emotional Support Dialogue
by: Zhang, Naifan, et al.
Published: (2026)