Saved in:
| Main Authors: | Wu, Zhaofeng, Yasunaga, Michihiro, Cohen, Andrew, Kim, Yoon, Celikyilmaz, Asli, Ghazvininejad, Marjan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.11751 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
by: Yasunaga, Michihiro, et al.
Published: (2025)
by: Yasunaga, Michihiro, et al.
Published: (2025)
ALMA: Alignment with Minimal Annotation
by: Yasunaga, Michihiro, et al.
Published: (2024)
by: Yasunaga, Michihiro, et al.
Published: (2024)
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
by: Hu, Yushi, et al.
Published: (2025)
by: Hu, Yushi, et al.
Published: (2025)
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
by: Wu, Zhaofeng, et al.
Published: (2024)
by: Wu, Zhaofeng, et al.
Published: (2024)
David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
by: Han, Xiaochuang, et al.
Published: (2023)
by: Han, Xiaochuang, et al.
Published: (2023)
Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
by: Liu, Jiacheng, et al.
Published: (2023)
by: Liu, Jiacheng, et al.
Published: (2023)
Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment
by: Merrill, William, et al.
Published: (2024)
by: Merrill, William, et al.
Published: (2024)
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
by: Saha, Swarnadeep, et al.
Published: (2023)
by: Saha, Swarnadeep, et al.
Published: (2023)
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
by: Han, Xiaochuang, et al.
Published: (2024)
by: Han, Xiaochuang, et al.
Published: (2024)
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
by: Yang, Kevin, et al.
Published: (2023)
by: Yang, Kevin, et al.
Published: (2023)
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
by: Saha, Swarnadeep, et al.
Published: (2025)
by: Saha, Swarnadeep, et al.
Published: (2025)
Open-Domain Text Evaluation via Contrastive Distribution Methods
by: Lu, Sidi, et al.
Published: (2023)
by: Lu, Sidi, et al.
Published: (2023)
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
by: Gutiérrez, Bernal Jiménez, et al.
Published: (2024)
by: Gutiérrez, Bernal Jiménez, et al.
Published: (2024)
The Majority is not always right: RL training for solution aggregation
by: Zhao, Wenting, et al.
Published: (2025)
by: Zhao, Wenting, et al.
Published: (2025)
Learning to Interrupt in Language-based Multi-agent Communication
by: Wang, Danqing, et al.
Published: (2026)
by: Wang, Danqing, et al.
Published: (2026)
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
by: Li, Chenxu, et al.
Published: (2025)
by: Li, Chenxu, et al.
Published: (2025)
HorizonBench: Long-Horizon Personalization with Evolving Preferences
by: Li, Shuyue Stella, et al.
Published: (2026)
by: Li, Shuyue Stella, et al.
Published: (2026)
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
by: Wu, Zhaofeng, et al.
Published: (2024)
by: Wu, Zhaofeng, et al.
Published: (2024)
Adaptive Decoding via Latent Preference Optimization
by: Dhuliawala, Shehzaad, et al.
Published: (2024)
by: Dhuliawala, Shehzaad, et al.
Published: (2024)
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
by: Wen, Bosi, et al.
Published: (2026)
by: Wen, Bosi, et al.
Published: (2026)
Implicit Representations of Grammaticality in Language Models
by: Wang, Yingshan Susan, et al.
Published: (2026)
by: Wang, Yingshan Susan, et al.
Published: (2026)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
by: Liu, Yantao, et al.
Published: (2024)
by: Liu, Yantao, et al.
Published: (2024)
Playing with Words, Improving with Rewards: Training Language Models for Creative Association
by: Deshpande, Vijeta, et al.
Published: (2026)
by: Deshpande, Vijeta, et al.
Published: (2026)
Representation Deficiency in Masked Language Modeling
by: Meng, Yu, et al.
Published: (2023)
by: Meng, Yu, et al.
Published: (2023)
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL
by: Wu, Zhaofeng, et al.
Published: (2026)
by: Wu, Zhaofeng, et al.
Published: (2026)
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
by: Tang, Zecheng, et al.
Published: (2026)
by: Tang, Zecheng, et al.
Published: (2026)
Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences
by: Ghoshal, Asish, et al.
Published: (2022)
by: Ghoshal, Asish, et al.
Published: (2022)
RewardBench 2: Advancing Reward Model Evaluation
by: Malik, Saumya, et al.
Published: (2025)
by: Malik, Saumya, et al.
Published: (2025)
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
by: Jin, Zhuoran, et al.
Published: (2024)
by: Jin, Zhuoran, et al.
Published: (2024)
Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling
by: Pathmanathan, Pankayaraj, et al.
Published: (2025)
by: Pathmanathan, Pankayaraj, et al.
Published: (2025)
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
by: Yan, Yuchen, et al.
Published: (2025)
by: Yan, Yuchen, et al.
Published: (2025)
Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning
by: Sclar, Melanie, et al.
Published: (2024)
by: Sclar, Melanie, et al.
Published: (2024)
Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models
by: Jiang, Song, et al.
Published: (2023)
by: Jiang, Song, et al.
Published: (2023)
Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models
by: Tuan, Yi-Lin, et al.
Published: (2024)
by: Tuan, Yi-Lin, et al.
Published: (2024)
Improving Chain-of-Thought Efficiency for Autoregressive Image Generation
by: Gu, Zeqi, et al.
Published: (2025)
by: Gu, Zeqi, et al.
Published: (2025)
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
by: Liu, Xuannan, et al.
Published: (2024)
by: Liu, Xuannan, et al.
Published: (2024)
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
by: Son, Guijin, et al.
Published: (2024)
by: Son, Guijin, et al.
Published: (2024)
Using Perspectival Words Is Harder Than Vocabulary Words for Humans and Even More So for Multimodal Language Models
by: Dong, Dota Tianai, et al.
Published: (2025)
by: Dong, Dota Tianai, et al.
Published: (2025)
Similar Items
-
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
by: Yasunaga, Michihiro, et al.
Published: (2025) -
ALMA: Alignment with Minimal Annotation
by: Yasunaga, Michihiro, et al.
Published: (2024) -
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
by: Hu, Yushi, et al.
Published: (2025) -
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
by: Wu, Zhaofeng, et al.
Published: (2024) -
David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
by: Han, Xiaochuang, et al.
Published: (2023)