Saved in:
| Main Authors: | Li, Yi-Chen, Xu, Tian, Yu, Yang, Zhang, Xuqin, Chen, Xiong-Hui, Ling, Zhongxiang, Chao, Ningjing, Yuan, Lei, Zhou, Zhi-Hua |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.23235 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
by: Qiu, Wenjie, et al.
Published: (2025)
by: Qiu, Wenjie, et al.
Published: (2025)
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
by: Lu, Songshuo, et al.
Published: (2025)
by: Lu, Songshuo, et al.
Published: (2025)
Towards Generalist Prompting for Large Language Models by Mental Models
by: Guan, Haoxiang, et al.
Published: (2024)
by: Guan, Haoxiang, et al.
Published: (2024)
Inference-Time Scaling for Generalist Reward Modeling
by: Liu, Zijun, et al.
Published: (2025)
by: Liu, Zijun, et al.
Published: (2025)
An Explicit Syllogistic Legal Reasoning Framework for Large Language Models
by: Zhang, Kepu, et al.
Published: (2025)
by: Zhang, Kepu, et al.
Published: (2025)
Off-Policy Value-Based Reinforcement Learning for Large Language Models
by: Wang, Peng-Yuan, et al.
Published: (2026)
by: Wang, Peng-Yuan, et al.
Published: (2026)
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
by: Jin, Zhuoran, et al.
Published: (2025)
by: Jin, Zhuoran, et al.
Published: (2025)
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
by: Han, Yang, et al.
Published: (2024)
by: Han, Yang, et al.
Published: (2024)
Tool-Augmented Reward Modeling
by: Li, Lei, et al.
Published: (2023)
by: Li, Lei, et al.
Published: (2023)
Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey
by: Qin, Weicong, et al.
Published: (2024)
by: Qin, Weicong, et al.
Published: (2024)
Geometry-Calibrated Conformal Abstention for Language Models
by: Xu, Rui, et al.
Published: (2026)
by: Xu, Rui, et al.
Published: (2026)
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
by: Liu, Qiyuan, et al.
Published: (2025)
by: Liu, Qiyuan, et al.
Published: (2025)
ProBench: Benchmarking Large Language Models in Competitive Programming
by: Yang, Lei, et al.
Published: (2025)
by: Yang, Lei, et al.
Published: (2025)
Neighboring Perturbations of Knowledge Editing on Large Language Models
by: Ma, Jun-Yu, et al.
Published: (2024)
by: Ma, Jun-Yu, et al.
Published: (2024)
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards
by: Zhang, Taolin, et al.
Published: (2025)
by: Zhang, Taolin, et al.
Published: (2025)
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
by: Lu, Yi-Fan, et al.
Published: (2024)
by: Lu, Yi-Fan, et al.
Published: (2024)
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
by: Li, Zherui, et al.
Published: (2025)
by: Li, Zherui, et al.
Published: (2025)
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
by: Zhang, Kai, et al.
Published: (2023)
by: Zhang, Kai, et al.
Published: (2023)
Aligning Large Language Models with Searcher Preferences
by: Wu, Wei, et al.
Published: (2026)
by: Wu, Wei, et al.
Published: (2026)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models
by: Chen, Bin, et al.
Published: (2025)
by: Chen, Bin, et al.
Published: (2025)
Generalists vs. Specialists: Evaluating Large Language Models for Urdu
by: Arif, Samee, et al.
Published: (2024)
by: Arif, Samee, et al.
Published: (2024)
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
by: Gu, Jia-Chen, et al.
Published: (2024)
by: Gu, Jia-Chen, et al.
Published: (2024)
Sparse Reward Subsystem in Large Language Models
by: Xu, Guowei, et al.
Published: (2026)
by: Xu, Guowei, et al.
Published: (2026)
HFT: Half Fine-Tuning for Large Language Models
by: Hui, Tingfeng, et al.
Published: (2024)
by: Hui, Tingfeng, et al.
Published: (2024)
Behavioral Fingerprinting of Large Language Models
by: Pei, Zehua, et al.
Published: (2025)
by: Pei, Zehua, et al.
Published: (2025)
A Survey on Large Language Models for Mathematical Reasoning
by: Wang, Peng-Yuan, et al.
Published: (2025)
by: Wang, Peng-Yuan, et al.
Published: (2025)
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
by: He, Bowei, et al.
Published: (2025)
by: He, Bowei, et al.
Published: (2025)
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
by: Xiong, Chen, et al.
Published: (2025)
by: Xiong, Chen, et al.
Published: (2025)
LargePiG: Your Large Language Model is Secretly a Pointer Generator
by: Sun, Zhongxiang, et al.
Published: (2024)
by: Sun, Zhongxiang, et al.
Published: (2024)
TESS 2: A Large-Scale Generalist Diffusion Language Model
by: Tae, Jaesung, et al.
Published: (2025)
by: Tae, Jaesung, et al.
Published: (2025)
Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks
by: Vishwanath, Krithik, et al.
Published: (2025)
by: Vishwanath, Krithik, et al.
Published: (2025)
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
by: Yuan, Chenhan, et al.
Published: (2024)
by: Yuan, Chenhan, et al.
Published: (2024)
Reward Models Identify Consistency, Not Causality
by: Xu, Yuhui, et al.
Published: (2025)
by: Xu, Yuhui, et al.
Published: (2025)
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data
by: Dorfner, Felix J., et al.
Published: (2024)
by: Dorfner, Felix J., et al.
Published: (2024)
LongReward: Improving Long-context Large Language Models with AI Feedback
by: Zhang, Jiajie, et al.
Published: (2024)
by: Zhang, Jiajie, et al.
Published: (2024)
Alignment for Efficient Tool Calling of Large Language Models
by: Xu, Hongshen, et al.
Published: (2025)
by: Xu, Hongshen, et al.
Published: (2025)
PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models
by: Wang, Chengbing, et al.
Published: (2026)
by: Wang, Chengbing, et al.
Published: (2026)
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
by: Zhang, Qingru, et al.
Published: (2025)
by: Zhang, Qingru, et al.
Published: (2025)
Self-Rewarding Language Models
by: Yuan, Weizhe, et al.
Published: (2024)
by: Yuan, Weizhe, et al.
Published: (2024)
Similar Items
-
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
by: Qiu, Wenjie, et al.
Published: (2025) -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
by: Lu, Songshuo, et al.
Published: (2025) -
Towards Generalist Prompting for Large Language Models by Mental Models
by: Guan, Haoxiang, et al.
Published: (2024) -
Inference-Time Scaling for Generalist Reward Modeling
by: Liu, Zijun, et al.
Published: (2025) -
An Explicit Syllogistic Legal Reasoning Framework for Large Language Models
by: Zhang, Kepu, et al.
Published: (2025)