Saved in:
| Main Authors: | Chen, Yizhou, Liu, Yawen, Wang, Xuesi, Yu, Qingtao, Huzhang, Guangda, Zeng, Anxiang, Yu, Han, Zhou, Zhiming |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.14838 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
by: Li, Meng, et al.
Published: (2025)
by: Li, Meng, et al.
Published: (2025)
Residual Multi-Task Learner for Applied Ranking
by: Fu, Cong, et al.
Published: (2024)
by: Fu, Cong, et al.
Published: (2024)
Towards Reliable Evaluation of Large Language Models for Multilingual and Multimodal E-Commerce Applications
by: Xie, Shuyi, et al.
Published: (2025)
by: Xie, Shuyi, et al.
Published: (2025)
Uncertainty Estimation of Large Language Models in Medical Question Answering
by: Wu, Jiaxin, et al.
Published: (2024)
by: Wu, Jiaxin, et al.
Published: (2024)
AMix-2: Establishing Protein as a Native Modality in Large Language Models
by: Qiu, Keyue, et al.
Published: (2026)
by: Qiu, Keyue, et al.
Published: (2026)
Secrets of RLHF in Large Language Models Part II: Reward Modeling
by: Wang, Binghai, et al.
Published: (2024)
by: Wang, Binghai, et al.
Published: (2024)
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
by: Wang, Chaoqi, et al.
Published: (2025)
by: Wang, Chaoqi, et al.
Published: (2025)
Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
by: Wang, Yibo, et al.
Published: (2026)
by: Wang, Yibo, et al.
Published: (2026)
Full-ECE: A Metric For Token-level Calibration on Large Language Models
by: Liu, Han, et al.
Published: (2024)
by: Liu, Han, et al.
Published: (2024)
DataMan: Data Manager for Pre-training Large Language Models
by: Peng, Ru, et al.
Published: (2025)
by: Peng, Ru, et al.
Published: (2025)
RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
by: Feng, Xiao, et al.
Published: (2026)
by: Feng, Xiao, et al.
Published: (2026)
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
by: Liu, Shuaitong, et al.
Published: (2025)
by: Liu, Shuaitong, et al.
Published: (2025)
RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis
by: Wang, Jianwei, et al.
Published: (2025)
by: Wang, Jianwei, et al.
Published: (2025)
Compass-Thinker-7B Technical Report
by: Zeng, Anxiang, et al.
Published: (2025)
by: Zeng, Anxiang, et al.
Published: (2025)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
by: Tang, Zhijiang, et al.
Published: (2026)
by: Tang, Zhijiang, et al.
Published: (2026)
Quantized Large Language Models in Biomedical Natural Language Processing: Evaluation and Recommendation
by: Zhan, Zaifu, et al.
Published: (2025)
by: Zhan, Zaifu, et al.
Published: (2025)
Large Language Models for Robotics: A Survey
by: Zeng, Fanlong, et al.
Published: (2023)
by: Zeng, Fanlong, et al.
Published: (2023)
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
by: Liu, Chris Yuhao, et al.
Published: (2024)
by: Liu, Chris Yuhao, et al.
Published: (2024)
Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
by: Liu, Yan, et al.
Published: (2026)
by: Liu, Yan, et al.
Published: (2026)
RewardAnything: Generalizable Principle-Following Reward Models
by: Yu, Zhuohao, et al.
Published: (2025)
by: Yu, Zhuohao, et al.
Published: (2025)
Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models
by: Wu, Yanru, et al.
Published: (2026)
by: Wu, Yanru, et al.
Published: (2026)
HuRef: HUman-REadable Fingerprint for Large Language Models
by: Zeng, Boyi, et al.
Published: (2023)
by: Zeng, Boyi, et al.
Published: (2023)
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
by: Li, Hao, et al.
Published: (2023)
by: Li, Hao, et al.
Published: (2023)
AHAMask: Reliable Task Specification for Large Audio Language Models without Instructions
by: Guo, Yiwei, et al.
Published: (2025)
by: Guo, Yiwei, et al.
Published: (2025)
Collaborative Editable Model
by: Tang, Kaiwen, et al.
Published: (2025)
by: Tang, Kaiwen, et al.
Published: (2025)
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
by: Zeng, Yuwei, et al.
Published: (2024)
by: Zeng, Yuwei, et al.
Published: (2024)
GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
by: Wang, Zhijie
Published: (2026)
by: Wang, Zhijie
Published: (2026)
Generating and Evolving Reward Functions for Highway Driving with Large Language Models
by: Han, Xu, et al.
Published: (2024)
by: Han, Xu, et al.
Published: (2024)
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
by: Dai, Muzhi, et al.
Published: (2025)
by: Dai, Muzhi, et al.
Published: (2025)
Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models
by: Zhou, Xin, et al.
Published: (2025)
by: Zhou, Xin, et al.
Published: (2025)
Neural Thermodynamic Laws for Large Language Model Training
by: Liu, Ziming, et al.
Published: (2025)
by: Liu, Ziming, et al.
Published: (2025)
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
by: Xie, Tianbao, et al.
Published: (2023)
by: Xie, Tianbao, et al.
Published: (2023)
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
by: Zhang, Yongshun, et al.
Published: (2025)
by: Zhang, Yongshun, et al.
Published: (2025)
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
by: Yin, Zhangyue, et al.
Published: (2025)
by: Yin, Zhangyue, et al.
Published: (2025)
Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability
by: Sakib, Md Sadman, et al.
Published: (2024)
by: Sakib, Md Sadman, et al.
Published: (2024)
Reward Models are Metrics in a Trench Coat
by: Gehrmann, Sebastian
Published: (2025)
by: Gehrmann, Sebastian
Published: (2025)
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
by: Tang, Zecheng, et al.
Published: (2026)
by: Tang, Zecheng, et al.
Published: (2026)
AINav: Large Language Model-Based Adaptive Interactive Navigation
by: Zhou, Kangjie, et al.
Published: (2025)
by: Zhou, Kangjie, et al.
Published: (2025)
ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models
by: Chen, Bin, et al.
Published: (2025)
by: Chen, Bin, et al.
Published: (2025)
Similar Items
-
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
by: Li, Meng, et al.
Published: (2025) -
Residual Multi-Task Learner for Applied Ranking
by: Fu, Cong, et al.
Published: (2024) -
Towards Reliable Evaluation of Large Language Models for Multilingual and Multimodal E-Commerce Applications
by: Xie, Shuyi, et al.
Published: (2025) -
Uncertainty Estimation of Large Language Models in Medical Question Answering
by: Wu, Jiaxin, et al.
Published: (2024) -
AMix-2: Establishing Protein as a Native Modality in Large Language Models
by: Qiu, Keyue, et al.
Published: (2026)