Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Xin, Wang, Jiahao, Wu, Chenyuan, Xiao, Shitao, Jiang, Xiyan, Lian, Defu, Zhang, Jiajun, Liu, Dong, liu, Zheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2509.23909
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917237614444544
author	Luo, Xin Wang, Jiahao Wu, Chenyuan Xiao, Shitao Jiang, Xiyan Lian, Defu Zhang, Jiajun Liu, Dong liu, Zheng
author_facet	Luo, Xin Wang, Jiahao Wu, Chenyuan Xiao, Shitao Jiang, Xiyan Lian, Defu Zhang, Jiajun Liu, Dong liu, Zheng
contents	Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_23909
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling Luo, Xin Wang, Jiahao Wu, Chenyuan Xiao, Shitao Jiang, Xiyan Lian, Defu Zhang, Jiajun Liu, Dong liu, Zheng Computer Vision and Pattern Recognition Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
title	EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2509.23909

Similar Items