Saved in:
| Main Authors: | Li, Xuying, Li, Zhuo, Kosuga, Yuji, Bian, Victor |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.21819 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
by: Li, Xuying, et al.
Published: (2025)
by: Li, Xuying, et al.
Published: (2025)
Precision Knowledge Editing: Enhancing Safety in Large Language Models
by: Li, Xuying, et al.
Published: (2024)
by: Li, Xuying, et al.
Published: (2024)
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025)
by: Shu, Huizhen, et al.
Published: (2025)
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation
by: Li, Xuying, et al.
Published: (2024)
by: Li, Xuying, et al.
Published: (2024)
The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)
by: Tan, Yuting, et al.
Published: (2025)
LIONs: An Empirically Optimized Approach to Align Language Models
by: Yu, Xiao, et al.
Published: (2024)
by: Yu, Xiao, et al.
Published: (2024)
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents
by: Zhong, Haitian, et al.
Published: (2026)
by: Zhong, Haitian, et al.
Published: (2026)
Balancing Rewards in Text Summarization: Multi-Objective Reinforcement Learning via HyperVolume Optimization
by: Song, Junjie, et al.
Published: (2025)
by: Song, Junjie, et al.
Published: (2025)
LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
UniPoll: A Unified Social Media Poll Generation Framework via Multi-Objective Optimization
by: Li, Yixia, et al.
Published: (2023)
by: Li, Yixia, et al.
Published: (2023)
Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization
by: Cang, Yueyang, et al.
Published: (2026)
by: Cang, Yueyang, et al.
Published: (2026)
TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization
by: Li, Peiji, et al.
Published: (2026)
by: Li, Peiji, et al.
Published: (2026)
Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework
by: Xu, Zhenjie, et al.
Published: (2024)
by: Xu, Zhenjie, et al.
Published: (2024)
MOPO: Multi-Objective Prompt Optimization for Affective Text Generation
by: Resendiz, Yarik Menchaca, et al.
Published: (2024)
by: Resendiz, Yarik Menchaca, et al.
Published: (2024)
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
by: Lin, Liang, et al.
Published: (2025)
by: Lin, Liang, et al.
Published: (2025)
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
by: Chen, Peter, et al.
Published: (2025)
by: Chen, Peter, et al.
Published: (2025)
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline
by: Duarte, André V., et al.
Published: (2025)
by: Duarte, André V., et al.
Published: (2025)
GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models
by: Zhang, Jixiao, et al.
Published: (2025)
by: Zhang, Jixiao, et al.
Published: (2025)
GCRE-GPT: A Generative Model for Comparative Relation Extraction
by: Wang, Yequan, et al.
Published: (2023)
by: Wang, Yequan, et al.
Published: (2023)
$λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
by: Wang, Yining, et al.
Published: (2025)
by: Wang, Yining, et al.
Published: (2025)
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
by: Zhao, Songwen, et al.
Published: (2025)
by: Zhao, Songwen, et al.
Published: (2025)
Open-domain Implicit Format Control for Large Language Model Generation
by: Yao, Yiqun, et al.
Published: (2024)
by: Yao, Yiqun, et al.
Published: (2024)
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
by: Rahman, Mizanur, et al.
Published: (2026)
by: Rahman, Mizanur, et al.
Published: (2026)
Aligning Language Models with Human Preferences via a Bayesian Approach
by: Wang, Jiashuo, et al.
Published: (2023)
by: Wang, Jiashuo, et al.
Published: (2023)
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
by: Ji, Wence, et al.
Published: (2025)
by: Ji, Wence, et al.
Published: (2025)
Aligning the Spectrum: Hybrid Graph Pre-training and Prompt Tuning across Homophily and Heterophily
by: Luo, Haitong, et al.
Published: (2025)
by: Luo, Haitong, et al.
Published: (2025)
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment
by: Wang, Haowen, et al.
Published: (2025)
by: Wang, Haowen, et al.
Published: (2025)
DRA-GRPO: Your GRPO Needs to Know Diverse Reasoning Paths for Mathematical Reasoning
by: Chen, Xiwen, et al.
Published: (2025)
by: Chen, Xiwen, et al.
Published: (2025)
Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models
by: Li, Chengao, et al.
Published: (2025)
by: Li, Chengao, et al.
Published: (2025)
Topic Modeling as Multi-Objective Contrastive Optimization
by: Nguyen, Thong, et al.
Published: (2024)
by: Nguyen, Thong, et al.
Published: (2024)
CogniAlign: Survivability-Grounded Multi-Agent Moral Reasoning for Safe and Transparent AI
by: Ali, Hasin Jawad, et al.
Published: (2025)
by: Ali, Hasin Jawad, et al.
Published: (2025)
InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
by: Wang, Fuyu, et al.
Published: (2025)
by: Wang, Fuyu, et al.
Published: (2025)
Multi-Objective Large Language Model Unlearning
by: Pan, Zibin, et al.
Published: (2024)
by: Pan, Zibin, et al.
Published: (2024)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
by: Li, Moxin, et al.
Published: (2025)
by: Li, Moxin, et al.
Published: (2025)
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning
by: Wang, Kaiwen, et al.
Published: (2024)
by: Wang, Kaiwen, et al.
Published: (2024)
Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO
by: Tian, Yu, et al.
Published: (2026)
by: Tian, Yu, et al.
Published: (2026)
UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization
by: Yuan, Peiwen, et al.
Published: (2025)
by: Yuan, Peiwen, et al.
Published: (2025)
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
by: Wang, Xinyuan, et al.
Published: (2024)
by: Wang, Xinyuan, et al.
Published: (2024)
Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
by: Deng, Jingcheng, et al.
Published: (2026)
by: Deng, Jingcheng, et al.
Published: (2026)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach
by: Zhang, Xinnan, et al.
Published: (2025)
by: Zhang, Xinnan, et al.
Published: (2025)
Similar Items
-
Output Length Effect on DeepSeek-R1's Safety in Forced Thinking
by: Li, Xuying, et al.
Published: (2025) -
Precision Knowledge Editing: Enhancing Safety in Large Language Models
by: Li, Xuying, et al.
Published: (2024) -
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
by: Shu, Huizhen, et al.
Published: (2025) -
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation
by: Li, Xuying, et al.
Published: (2024) -
The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)