Enregistré dans:
| Auteurs principaux: | , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2603.06873 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866912950722232320 |
|---|---|
| author | Zhou, Hang Zuo, Xinxin Wang, Sen Cheng, Li |
| author_facet | Zhou, Hang Zuo, Xinxin Wang, Sen Cheng, Li |
| contents | Despite strong single-turn performance, diffusion-based image compositing often struggles to preserve coherent spatial relations in pairwise or sequential edits, where subsequent insertions may overwrite previously generated content and disrupt physical consistency. We introduce PICS, a self-supervised composition-by-decomposition paradigm that composes objects in parallel while explicitly modeling the compositional interactions among (fully-/partially-)visible objects and background. At its core, an Interaction Transformer employs mask-guided Mixture-of-Experts to route background, exclusive, and overlap regions to dedicated experts, with an adaptive α-blending strategy that infers a compatibility-aware fusion of overlapping objects while preserving boundary fidelity. To further enhance robustness to geometric variations, we incorporate geometry-aware augmentations covering both out-of-plane and in-plane pose changes of objects. Our method delivers superior pairwise compositing quality and substantially improved stability, with extensive evaluations across virtual try-on, indoor, and street scene settings showing consistent gains over state-of-the-art baselines. Code and data are available at https://github.com/RyanHangZhou/PICS |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_06873 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | PICS: Pairwise Image Compositing with Spatial Interactions Zhou, Hang Zuo, Xinxin Wang, Sen Cheng, Li Computer Vision and Pattern Recognition Despite strong single-turn performance, diffusion-based image compositing often struggles to preserve coherent spatial relations in pairwise or sequential edits, where subsequent insertions may overwrite previously generated content and disrupt physical consistency. We introduce PICS, a self-supervised composition-by-decomposition paradigm that composes objects in parallel while explicitly modeling the compositional interactions among (fully-/partially-)visible objects and background. At its core, an Interaction Transformer employs mask-guided Mixture-of-Experts to route background, exclusive, and overlap regions to dedicated experts, with an adaptive α-blending strategy that infers a compatibility-aware fusion of overlapping objects while preserving boundary fidelity. To further enhance robustness to geometric variations, we incorporate geometry-aware augmentations covering both out-of-plane and in-plane pose changes of objects. Our method delivers superior pairwise compositing quality and substantially improved stability, with extensive evaluations across virtual try-on, indoor, and street scene settings showing consistent gains over state-of-the-art baselines. Code and data are available at https://github.com/RyanHangZhou/PICS |
| title | PICS: Pairwise Image Compositing with Spatial Interactions |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2603.06873 |