Saved in:
| Main Authors: | Wang, Binghai, Zheng, Rui, Chen, Lu, Liu, Yan, Dou, Shihan, Huang, Caishuang, Shen, Wei, Jin, Senjie, Zhou, Enyu, Shi, Chenyu, Gao, Songyang, Xu, Nuo, Zhou, Yuhao, Fan, Xiaoran, Xi, Zhiheng, Zhao, Jun, Wang, Xiao, Ji, Tao, Yan, Hang, Shen, Lixing, Chen, Zhan, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.06080 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
by: Dou, Shihan, et al.
Published: (2023)
by: Dou, Shihan, et al.
Published: (2023)
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023)
by: Xi, Zhiheng, et al.
Published: (2023)
MouSi: Poly-Visual-Expert Vision-Language Models
by: Fan, Xiaoran, et al.
Published: (2024)
by: Fan, Xiaoran, et al.
Published: (2024)
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
by: Ye, Junjie, et al.
Published: (2024)
by: Ye, Junjie, et al.
Published: (2024)
VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision
by: Zhu, Dingwei, et al.
Published: (2025)
by: Zhu, Dingwei, et al.
Published: (2025)
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
by: Zhou, Enyu, et al.
Published: (2024)
by: Zhou, Enyu, et al.
Published: (2024)
JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees
by: Wang, Yuhui, et al.
Published: (2026)
by: Wang, Yuhui, et al.
Published: (2026)
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
by: Lin, Jiahang, et al.
Published: (2026)
by: Lin, Jiahang, et al.
Published: (2026)
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
by: Zhang, Zhihao, et al.
Published: (2025)
by: Zhang, Zhihao, et al.
Published: (2025)
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
by: Xu, Nuo, et al.
Published: (2024)
by: Xu, Nuo, et al.
Published: (2024)
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
by: Lv, Huijie, et al.
Published: (2024)
by: Lv, Huijie, et al.
Published: (2024)
RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
by: Ye, Junjie, et al.
Published: (2024)
by: Ye, Junjie, et al.
Published: (2024)
EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection
by: Zhou, Yuhao, et al.
Published: (2025)
by: Zhou, Yuhao, et al.
Published: (2025)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Steering LLMs via Scalable Interactive Oversight
by: Zhou, Enyu, et al.
Published: (2026)
by: Zhou, Enyu, et al.
Published: (2026)
Pre-Trained Policy Discriminators are General Reward Models
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
by: Liu, Boyang, et al.
Published: (2025)
by: Liu, Boyang, et al.
Published: (2025)
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
by: Huang, Caishuang, et al.
Published: (2024)
by: Huang, Caishuang, et al.
Published: (2024)
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
by: Ye, Junjie, et al.
Published: (2024)
by: Ye, Junjie, et al.
Published: (2024)
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
by: Zhou, Weikang, et al.
Published: (2024)
by: Zhou, Weikang, et al.
Published: (2024)
MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models
by: Fan, Xiaoran, et al.
Published: (2026)
by: Fan, Xiaoran, et al.
Published: (2026)
The Role of Entropy in Visual Grounding: Analysis and Optimization
by: Li, Shuo, et al.
Published: (2025)
by: Li, Shuo, et al.
Published: (2025)
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
by: Li, Peng, et al.
Published: (2026)
by: Li, Peng, et al.
Published: (2026)
Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals
by: Zheng, Rui, et al.
Published: (2024)
by: Zheng, Rui, et al.
Published: (2024)
MetaRM: Shifted Distributions Alignment via Meta-Learning
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
by: Zhang, Ming, et al.
Published: (2024)
by: Zhang, Ming, et al.
Published: (2024)
DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training
by: Zhu, Dingwei, et al.
Published: (2026)
by: Zhu, Dingwei, et al.
Published: (2026)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training
by: Jiang, Changhao, et al.
Published: (2025)
by: Jiang, Changhao, et al.
Published: (2025)
Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning
by: Jin, Senjie, et al.
Published: (2025)
by: Jin, Senjie, et al.
Published: (2025)
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
by: Zhu, Dingwei, et al.
Published: (2025)
by: Zhu, Dingwei, et al.
Published: (2025)
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training
by: Pan, Chengjun, et al.
Published: (2026)
by: Pan, Chengjun, et al.
Published: (2026)
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
by: Xia, Han, et al.
Published: (2024)
by: Xia, Han, et al.
Published: (2024)
LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition
by: Ye, Junjie, et al.
Published: (2024)
by: Ye, Junjie, et al.
Published: (2024)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
by: Bao, Rong, et al.
Published: (2024)
by: Bao, Rong, et al.
Published: (2024)
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
by: Lin, Jiahang, et al.
Published: (2026)
by: Lin, Jiahang, et al.
Published: (2026)
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
Improving RL Exploration for LLM Reasoning through Retrospective Replay
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
Similar Items
-
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
by: Dou, Shihan, et al.
Published: (2024) -
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
by: Dou, Shihan, et al.
Published: (2023) -
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023) -
MouSi: Poly-Visual-Expert Vision-Language Models
by: Fan, Xiaoran, et al.
Published: (2024) -
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
by: Ye, Junjie, et al.
Published: (2024)