Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liyi, Chen, Pengfei, Wang, Guowen, Zhang, Zhiyuan, Ma, Lei, Zhang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.17841
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912972942606336
author	Liyi, Chen Pengfei, Wang Guowen, Zhang Zhiyuan, Ma Lei, Zhang
author_facet	Liyi, Chen Pengfei, Wang Guowen, Zhang Zhiyuan, Ma Lei, Zhang
contents	Most instruction-driven 3D editing methods rely on 2D models to guide the explicit and iterative optimization of 3D representations. This paradigm, however, suffers from two primary drawbacks. First, it lacks a universal design of different 3D editing tasks because the explicit manipulation of 3D geometry necessitates task-dependent rules, e.g., 3D appearance editing demands inherent source 3D geometry, while 3D removal alters source geometry. Second, the iterative optimization process is highly time-consuming, often requiring thousands of invocations of 2D/3D updating. We present Omni-3DEdit, a unified, learning-based model that generalizes various 3D editing tasks implicitly. One key challenge to achieve our goal is the scarcity of paired source-edited multi-view assets for training. To address this issue, we construct a data pipeline, synthesizing a relatively rich number of high-quality paired multi-view editing samples. Subsequently, we adapt the pre-trained generative model SEVA as our backbone by concatenating source view latents along with conditional tokens in sequence space. A dual-stream LoRA module is proposed to disentangle different view cues, largely enhancing our model's representational learning capability. As a learning-based model, our model is free of the time-consuming online optimization, and it can complete various 3D editing tasks in one forward pass, reducing the inference time from tens of minutes to approximately two minutes. Extensive experiments demonstrate the effectiveness and efficiency of Omni-3DEdit.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_17841
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass Liyi, Chen Pengfei, Wang Guowen, Zhang Zhiyuan, Ma Lei, Zhang Computer Vision and Pattern Recognition Most instruction-driven 3D editing methods rely on 2D models to guide the explicit and iterative optimization of 3D representations. This paradigm, however, suffers from two primary drawbacks. First, it lacks a universal design of different 3D editing tasks because the explicit manipulation of 3D geometry necessitates task-dependent rules, e.g., 3D appearance editing demands inherent source 3D geometry, while 3D removal alters source geometry. Second, the iterative optimization process is highly time-consuming, often requiring thousands of invocations of 2D/3D updating. We present Omni-3DEdit, a unified, learning-based model that generalizes various 3D editing tasks implicitly. One key challenge to achieve our goal is the scarcity of paired source-edited multi-view assets for training. To address this issue, we construct a data pipeline, synthesizing a relatively rich number of high-quality paired multi-view editing samples. Subsequently, we adapt the pre-trained generative model SEVA as our backbone by concatenating source view latents along with conditional tokens in sequence space. A dual-stream LoRA module is proposed to disentangle different view cues, largely enhancing our model's representational learning capability. As a learning-based model, our model is free of the time-consuming online optimization, and it can complete various 3D editing tasks in one forward pass, reducing the inference time from tens of minutes to approximately two minutes. Extensive experiments demonstrate the effectiveness and efficiency of Omni-3DEdit.
title	Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.17841

Similar Items