Saved in:
| Main Authors: | Zhang, Ruihan, Sun, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03401 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
by: Li, Xinzhe, et al.
Published: (2023)
by: Li, Xinzhe, et al.
Published: (2023)
The Unlearnability Phenomenon in RLVR for Language Models
by: Chen, Yulin, et al.
Published: (2026)
by: Chen, Yulin, et al.
Published: (2026)
Towards Provably Unlearnable Examples via Bayes Error Optimisation
by: Zhang, Ruihan, et al.
Published: (2025)
by: Zhang, Ruihan, et al.
Published: (2025)
Efficient Knowledge Infusion via KG-LLM Alignment
by: Jiang, Zhouyu, et al.
Published: (2024)
by: Jiang, Zhouyu, et al.
Published: (2024)
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
by: Zhang, Yihao, et al.
Published: (2026)
by: Zhang, Yihao, et al.
Published: (2026)
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
by: Cao, Maosong, et al.
Published: (2025)
by: Cao, Maosong, et al.
Published: (2025)
Exploring LLM-based Data Annotation Strategies for Medical Dialogue Preference Alignment
by: Dou, Chengfeng, et al.
Published: (2024)
by: Dou, Chengfeng, et al.
Published: (2024)
Exploiting Pseudo Image Captions for Multimodal Summarization
by: Jiang, Chaoya, et al.
Published: (2023)
by: Jiang, Chaoya, et al.
Published: (2023)
RM-Distiller: Exploiting Generative LLM for Reward Model Distillation
by: Zhou, Hongli, et al.
Published: (2026)
by: Zhou, Hongli, et al.
Published: (2026)
ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment
by: Luo, Yangyifei, et al.
Published: (2024)
by: Luo, Yangyifei, et al.
Published: (2024)
Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data
by: Chen, Zhuowei, et al.
Published: (2025)
by: Chen, Zhuowei, et al.
Published: (2025)
Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought
by: Si, Jianfeng, et al.
Published: (2026)
by: Si, Jianfeng, et al.
Published: (2026)
LLM-as-a-Judge for Privacy Evaluation? Exploring the Alignment of Human and LLM Perceptions of Privacy in Textual Data
by: Meisenbacher, Stephen, et al.
Published: (2025)
by: Meisenbacher, Stephen, et al.
Published: (2025)
PMMT: Preference Alignment in Multilingual Machine Translation via LLM Distillation
by: Sun, Shuqiao, et al.
Published: (2024)
by: Sun, Shuqiao, et al.
Published: (2024)
REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective
by: Xu, Zhihao, et al.
Published: (2025)
by: Xu, Zhihao, et al.
Published: (2025)
Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
by: Sun, Guanglong, et al.
Published: (2026)
by: Sun, Guanglong, et al.
Published: (2026)
Evaluating LLM Alignment on Personality Inference from Real-World Interview Data
by: Zhu, Jianfeng, et al.
Published: (2025)
by: Zhu, Jianfeng, et al.
Published: (2025)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)
by: Fayyaz, Mohsen, et al.
Published: (2024)
Understanding Layer Significance in LLM Alignment
by: Shi, Guangyuan, et al.
Published: (2024)
by: Shi, Guangyuan, et al.
Published: (2024)
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception
by: Ni, Shiyu, et al.
Published: (2025)
by: Ni, Shiyu, et al.
Published: (2025)
Can We Infer Confidential Properties of Training Data from LLMs?
by: Huang, Pengrun, et al.
Published: (2025)
by: Huang, Pengrun, et al.
Published: (2025)
Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding
by: Bao, Jianzhu, et al.
Published: (2026)
by: Bao, Jianzhu, et al.
Published: (2026)
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
by: Sun, Chenkai, et al.
Published: (2025)
by: Sun, Chenkai, et al.
Published: (2025)
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
by: Li, Yifan, et al.
Published: (2024)
by: Li, Yifan, et al.
Published: (2024)
MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment
by: Qin, Weicong, et al.
Published: (2025)
by: Qin, Weicong, et al.
Published: (2025)
Societal Alignment Frameworks Can Improve LLM Alignment
by: Stańczak, Karolina, et al.
Published: (2025)
by: Stańczak, Karolina, et al.
Published: (2025)
Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework
by: Yao, Xintong
Published: (2026)
by: Yao, Xintong
Published: (2026)
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era
by: Wu, Xuansheng, et al.
Published: (2024)
by: Wu, Xuansheng, et al.
Published: (2024)
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
by: Bethany, Mazal, et al.
Published: (2025)
by: Bethany, Mazal, et al.
Published: (2025)
Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment
by: Zhang, Yizhuo, et al.
Published: (2025)
by: Zhang, Yizhuo, et al.
Published: (2025)
Leveraging Robust Optimization for LLM Alignment under Distribution Shifts
by: Zhu, Mingye, et al.
Published: (2025)
by: Zhu, Mingye, et al.
Published: (2025)
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
by: Huang, Yue, et al.
Published: (2025)
by: Huang, Yue, et al.
Published: (2025)
Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment
by: Cheng, Zehua, et al.
Published: (2026)
by: Cheng, Zehua, et al.
Published: (2026)
StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
by: Khan, Ishmam, et al.
Published: (2026)
by: Khan, Ishmam, et al.
Published: (2026)
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
by: Zhou, Enyu, et al.
Published: (2024)
by: Zhou, Enyu, et al.
Published: (2024)
PAD: Towards Efficient Data Generation for Transfer Learning Using Phrase Alignment
by: Kim, Jong Myoung, et al.
Published: (2025)
by: Kim, Jong Myoung, et al.
Published: (2025)
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
by: Mou, Yutao, et al.
Published: (2025)
by: Mou, Yutao, et al.
Published: (2025)
Data-efficient LLM Fine-tuning for Code Generation
by: Lv, Weijie, et al.
Published: (2025)
by: Lv, Weijie, et al.
Published: (2025)
MICA: Multi-granularity Intertemporal Credit Assignment for Long-Horizon Emotional Support Dialogue
by: Zhang, Naifan, et al.
Published: (2026)
by: Zhang, Naifan, et al.
Published: (2026)
Similar Items
-
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
by: Li, Xinzhe, et al.
Published: (2023) -
The Unlearnability Phenomenon in RLVR for Language Models
by: Chen, Yulin, et al.
Published: (2026) -
Towards Provably Unlearnable Examples via Bayes Error Optimisation
by: Zhang, Ruihan, et al.
Published: (2025) -
Efficient Knowledge Infusion via KG-LLM Alignment
by: Jiang, Zhouyu, et al.
Published: (2024) -
The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
by: Zhang, Yihao, et al.
Published: (2026)