Saved in:
| Main Authors: | Wang, Xinda, Hou, Zhengxu, Zhang, Yangshijie, Yan, Bingren, Liu, Jialin, Zhao, Chenzhuo, Yang, Zhibo, Yang, Bin-Bin, Xiao, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.11522 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation
by: Wang, Xinda, et al.
Published: (2025)
by: Wang, Xinda, et al.
Published: (2025)
Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent
by: Zhang, Yangshijie, et al.
Published: (2025)
by: Zhang, Yangshijie, et al.
Published: (2025)
TASE: Token Awareness and Structured Evaluation for Multilingual Language Models
by: Zhao, Chenzhuo, et al.
Published: (2025)
by: Zhao, Chenzhuo, et al.
Published: (2025)
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models
by: Zhao, Chenzhuo, et al.
Published: (2025)
by: Zhao, Chenzhuo, et al.
Published: (2025)
Emoti-Attack: Zero-Perturbation Adversarial Attacks on NLP Systems via Emoji Sequences
by: Zhang, Yangshijie
Published: (2025)
by: Zhang, Yangshijie
Published: (2025)
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
by: Peng, Hao, et al.
Published: (2025)
by: Peng, Hao, et al.
Published: (2025)
NoveltyRank: A Retrieval-Augmented Framework for Conceptual Novelty Estimation in AI Research
by: Yan, Zhengxu, et al.
Published: (2025)
by: Yan, Zhengxu, et al.
Published: (2025)
MedReflect: Teaching Medical LLMs to Self-Improve via Reflective Correction
by: Huang, Yue, et al.
Published: (2025)
by: Huang, Yue, et al.
Published: (2025)
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
EvoSpark: Endogenous Interactive Agent Societies for Unified Long-Horizon Narrative Evolution
by: He, Shiyu, et al.
Published: (2026)
by: He, Shiyu, et al.
Published: (2026)
CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards
by: Lin, Zhiming, et al.
Published: (2025)
by: Lin, Zhiming, et al.
Published: (2025)
No Query, No Access
by: Wang, Wenqiang, et al.
Published: (2025)
by: Wang, Wenqiang, et al.
Published: (2025)
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
StoryAlign: Evaluating and Training Reward Models for Story Generation
by: Xia, Haotian, et al.
Published: (2026)
by: Xia, Haotian, et al.
Published: (2026)
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
by: Liu, Chris Yuhao, et al.
Published: (2024)
by: Liu, Chris Yuhao, et al.
Published: (2024)
GRAM: A Generative Foundation Reward Model for Reward Generalization
by: Wang, Chenglong, et al.
Published: (2025)
by: Wang, Chenglong, et al.
Published: (2025)
ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards
by: Li, Shiyu, et al.
Published: (2025)
by: Li, Shiyu, et al.
Published: (2025)
RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
by: Liao, Jianxing, et al.
Published: (2025)
by: Liao, Jianxing, et al.
Published: (2025)
Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
by: Han, Tianyang, et al.
Published: (2026)
by: Han, Tianyang, et al.
Published: (2026)
Self-Correction Makes LLMs Better Parsers
by: Zhang, Ziyan, et al.
Published: (2025)
by: Zhang, Ziyan, et al.
Published: (2025)
Pearmut: Human Evaluation of Translation Made Trivial
by: Zouhar, Vilém, et al.
Published: (2026)
by: Zouhar, Vilém, et al.
Published: (2026)
ARIA: Training Language Agents with Intention-Driven Reward Aggregation
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
by: He, Guanzhong, et al.
Published: (2025)
by: He, Guanzhong, et al.
Published: (2025)
Incomplete In-context Learning
by: Wang, Wenqiang, et al.
Published: (2025)
by: Wang, Wenqiang, et al.
Published: (2025)
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
Unveiling the Impact of Multimodal Features on Chinese Spelling Correction: From Analysis to Design
by: Zhang, Xiaowu, et al.
Published: (2025)
by: Zhang, Xiaowu, et al.
Published: (2025)
SAFER: Advancing Safety Alignment via Efficient Ex-Ante Reasoning
by: Feng, Kehua, et al.
Published: (2025)
by: Feng, Kehua, et al.
Published: (2025)
Long-form RewardBench: Evaluating Reward Models for Long-form Generation
by: Huang, Hui, et al.
Published: (2026)
by: Huang, Hui, et al.
Published: (2026)
VERINA: Benchmarking Verifiable Code Generation
by: Ye, Zhe, et al.
Published: (2025)
by: Ye, Zhe, et al.
Published: (2025)
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
by: Zhang, Jiajie, et al.
Published: (2026)
by: Zhang, Jiajie, et al.
Published: (2026)
VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning
by: Lu, Xingyu, et al.
Published: (2026)
by: Lu, Xingyu, et al.
Published: (2026)
Towards Robust Process Reward Modeling via Noise-aware Learning
by: Xie, Bin, et al.
Published: (2026)
by: Xie, Bin, et al.
Published: (2026)
PERM: Psychology-grounded Empathetic Reward Modeling for Large Language Models
by: Wang, Chengbing, et al.
Published: (2026)
by: Wang, Chengbing, et al.
Published: (2026)
Small Reward Models via Backward Inference
by: Wang, Yike, et al.
Published: (2026)
by: Wang, Yike, et al.
Published: (2026)
Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
by: Xu, Zhichao, et al.
Published: (2025)
by: Xu, Zhichao, et al.
Published: (2025)
ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models
by: Zheng, Kangjie, et al.
Published: (2025)
by: Zheng, Kangjie, et al.
Published: (2025)
Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
by: Shuai, Zitao, et al.
Published: (2024)
by: Shuai, Zitao, et al.
Published: (2024)
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025)
by: Hong, Haitao, et al.
Published: (2025)
MPO: Multilingual Safety Alignment via Reward Gap Optimization
by: Zhao, Weixiang, et al.
Published: (2025)
by: Zhao, Weixiang, et al.
Published: (2025)
Similar Items
-
EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation
by: Wang, Xinda, et al.
Published: (2025) -
Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent
by: Zhang, Yangshijie, et al.
Published: (2025) -
TASE: Token Awareness and Structured Evaluation for Multilingual Language Models
by: Zhao, Chenzhuo, et al.
Published: (2025) -
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models
by: Zhao, Chenzhuo, et al.
Published: (2025) -
Emoti-Attack: Zero-Perturbation Adversarial Attacks on NLP Systems via Emoji Sequences
by: Zhang, Yangshijie
Published: (2025)