Saved in:
Bibliographic Details
Main Authors: Ma, Jian, Zhu, Xujie, Pan, Zihao, Peng, Qirong, Guo, Xu, Chen, Chen, Lu, Haonan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.07607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909893727879168
author Ma, Jian
Zhu, Xujie
Pan, Zihao
Peng, Qirong
Guo, Xu
Chen, Chen
Lu, Haonan
author_facet Ma, Jian
Zhu, Xujie
Pan, Zihao
Peng, Qirong
Guo, Xu
Chen, Chen
Lu, Haonan
contents Existing open-source datasets for arbitrary-instruction image editing remain suboptimal, while a plug-and-play editing module compatible with community-prevalent generative models is notably absent. In this paper, we first introduce the X2Edit Dataset, a comprehensive dataset covering 14 diverse editing tasks, including subject-driven generation. We utilize the industry-leading unified image generation models and expert models to construct the data. Meanwhile, we design reasonable editing instructions with the VLM and implement various scoring mechanisms to filter the data. As a result, we construct 3.7 million high-quality data with balanced categories. Second, to better integrate seamlessly with community image generation models, we design task-aware MoE-LoRA training based on FLUX.1, with only 8\% of the parameters of the full model. To further improve the final performance, we utilize the internal representations of the diffusion model and define positive/negative samples based on image editing types to introduce contrastive learning. Extensive experiments demonstrate that the model's editing performance is competitive among many excellent models. Additionally, the constructed dataset exhibits substantial advantages over existing open-source datasets. The open-source code, checkpoints, and datasets for X2Edit can be found at the following link: https://github.com/OPPO-Mente-Lab/X2Edit.
format Preprint
id arxiv_https___arxiv_org_abs_2508_07607
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
Ma, Jian
Zhu, Xujie
Pan, Zihao
Peng, Qirong
Guo, Xu
Chen, Chen
Lu, Haonan
Computer Vision and Pattern Recognition
Existing open-source datasets for arbitrary-instruction image editing remain suboptimal, while a plug-and-play editing module compatible with community-prevalent generative models is notably absent. In this paper, we first introduce the X2Edit Dataset, a comprehensive dataset covering 14 diverse editing tasks, including subject-driven generation. We utilize the industry-leading unified image generation models and expert models to construct the data. Meanwhile, we design reasonable editing instructions with the VLM and implement various scoring mechanisms to filter the data. As a result, we construct 3.7 million high-quality data with balanced categories. Second, to better integrate seamlessly with community image generation models, we design task-aware MoE-LoRA training based on FLUX.1, with only 8\% of the parameters of the full model. To further improve the final performance, we utilize the internal representations of the diffusion model and define positive/negative samples based on image editing types to introduce contrastive learning. Extensive experiments demonstrate that the model's editing performance is competitive among many excellent models. Additionally, the constructed dataset exhibits substantial advantages over existing open-source datasets. The open-source code, checkpoints, and datasets for X2Edit can be found at the following link: https://github.com/OPPO-Mente-Lab/X2Edit.
title X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2508.07607