Saved in:
| Main Authors: | Yang, Tiankai, Nian, Yi, Li, Xinyuan, Xu, Ruiyao, Ding, Kaize, Zhao, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.17299 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents
by: Yang, Tiankai, et al.
Published: (2026)
by: Yang, Tiankai, et al.
Published: (2026)
GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback
by: Xu, Ruiyao, et al.
Published: (2026)
by: Xu, Ruiyao, et al.
Published: (2026)
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
by: Yang, Tiankai, et al.
Published: (2024)
by: Yang, Tiankai, et al.
Published: (2024)
CoAct: Co-Active LLM Preference Learning with Human-AI Synergy
by: Xu, Ruiyao, et al.
Published: (2026)
by: Xu, Ruiyao, et al.
Published: (2026)
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
Synthetic Clinical Notes for Rare ICD Codes: A Data-Centric Framework for Long-Tail Medical Coding
by: Vo, Truong, et al.
Published: (2025)
by: Vo, Truong, et al.
Published: (2025)
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
by: Hu, Mengxuan, et al.
Published: (2026)
by: Hu, Mengxuan, et al.
Published: (2026)
Empowering Large Language Models for Textual Data Augmentation
by: Li, Yichuan, et al.
Published: (2024)
by: Li, Yichuan, et al.
Published: (2024)
MixDPO: Modeling Preference Strength for Pluralistic Alignment
by: Imai, Saki, et al.
Published: (2026)
by: Imai, Saki, et al.
Published: (2026)
StealthRank: LLM Ranking Manipulation via Stealthy Prompt Optimization
by: Tang, Yiming, et al.
Published: (2025)
by: Tang, Yiming, et al.
Published: (2025)
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
by: Lee, Andrew, et al.
Published: (2024)
by: Lee, Andrew, et al.
Published: (2024)
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
SP^2DPO: An LLM-assisted Semantic Per-Pair DPO Generalization
by: He, Chaoyue, et al.
Published: (2026)
by: He, Chaoyue, et al.
Published: (2026)
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
by: Pattnaik, Pulkit, et al.
Published: (2024)
by: Pattnaik, Pulkit, et al.
Published: (2024)
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
by: Cao, Guiming, et al.
Published: (2024)
by: Cao, Guiming, et al.
Published: (2024)
STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules
by: Wu, Di, et al.
Published: (2026)
by: Wu, Di, et al.
Published: (2026)
The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval
by: Tong, Zekai, et al.
Published: (2026)
by: Tong, Zekai, et al.
Published: (2026)
Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning
by: Qin, Yuehan, et al.
Published: (2025)
by: Qin, Yuehan, et al.
Published: (2025)
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
by: Zhang, Zhengze, et al.
Published: (2025)
by: Zhang, Zhengze, et al.
Published: (2025)
MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models
by: Mao, Kangkun, et al.
Published: (2025)
by: Mao, Kangkun, et al.
Published: (2025)
AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection
by: Yang, Tiankai, et al.
Published: (2025)
by: Yang, Tiankai, et al.
Published: (2025)
An Empirical Study of SFT-DPO Interaction and Parameterization in Small Language Models
by: Feng, Yuming, et al.
Published: (2026)
by: Feng, Yuming, et al.
Published: (2026)
When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution
by: Nian, Yi, et al.
Published: (2026)
by: Nian, Yi, et al.
Published: (2026)
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
Aligning Large Language Models with Counterfactual DPO
by: Butcher, Bradley
Published: (2024)
by: Butcher, Bradley
Published: (2024)
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
by: Luo, Junyu, et al.
Published: (2024)
by: Luo, Junyu, et al.
Published: (2024)
DPO Meets PPO: Reinforced Token Optimization for RLHF
by: Zhong, Han, et al.
Published: (2024)
by: Zhong, Han, et al.
Published: (2024)
LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)
by: Xu, Rongge, et al.
Published: (2025)
by: Xu, Rongge, et al.
Published: (2025)
Safety Alignment via Constrained Knowledge Unlearning
by: Shi, Zesheng, et al.
Published: (2025)
by: Shi, Zesheng, et al.
Published: (2025)
Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M
by: Pant, Piyush
Published: (2025)
by: Pant, Piyush
Published: (2025)
SmurfCat at PAN 2024 TextDetox: Alignment of Multilingual Transformers for Text Detoxification
by: Rykov, Elisei, et al.
Published: (2024)
by: Rykov, Elisei, et al.
Published: (2024)
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
by: Khaki, Saeed, et al.
Published: (2024)
by: Khaki, Saeed, et al.
Published: (2024)
Safety Is Not Universal: The Selective Safety Trap in LLM Alignment
by: Brito, Iago Alves, et al.
Published: (2026)
by: Brito, Iago Alves, et al.
Published: (2026)
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
by: Cho, Jay Hyeon, et al.
Published: (2025)
by: Cho, Jay Hyeon, et al.
Published: (2025)
Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal
by: Yuan, Aojie, et al.
Published: (2026)
by: Yuan, Aojie, et al.
Published: (2026)
MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization
by: Wang, Ziqing, et al.
Published: (2026)
by: Wang, Ziqing, et al.
Published: (2026)
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering
by: Wang, Ziqing, et al.
Published: (2025)
by: Wang, Ziqing, et al.
Published: (2025)
Towards Inference-time Category-wise Safety Steering for Large Language Models
by: Bhattacharjee, Amrita, et al.
Published: (2024)
by: Bhattacharjee, Amrita, et al.
Published: (2024)
Context-DPO: Aligning Language Models for Context-Faithfulness
by: Bi, Baolong, et al.
Published: (2024)
by: Bi, Baolong, et al.
Published: (2024)
Reasoning Pattern Alignment Merging for Adaptive Reasoning
by: Zhong, Zhaofeng, et al.
Published: (2026)
by: Zhong, Zhaofeng, et al.
Published: (2026)
Similar Items
-
No Attacker Needed: Unintentional Cross-User Contamination in Shared-State LLM Agents
by: Yang, Tiankai, et al.
Published: (2026) -
GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback
by: Xu, Ruiyao, et al.
Published: (2026) -
AD-LLM: Benchmarking Large Language Models for Anomaly Detection
by: Yang, Tiankai, et al.
Published: (2024) -
CoAct: Co-Active LLM Preference Learning with Human-AI Synergy
by: Xu, Ruiyao, et al.
Published: (2026) -
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)