Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sarkar, Ayushman, Yu, Zhenyu, Chen, Chu, Tang, Wei, Cui, Kangning, Idris, Mohd Yamani Idna
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01303
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911414278422528
author	Sarkar, Ayushman Yu, Zhenyu Chen, Chu Tang, Wei Cui, Kangning Idris, Mohd Yamani Idna
author_facet	Sarkar, Ayushman Yu, Zhenyu Chen, Chu Tang, Wei Cui, Kangning Idris, Mohd Yamani Idna
contents	Generating coherent visual stories requires maintaining subject identity across multiple images while preserving frame-specific semantics. Recent training-free methods concatenate identity and frame prompts into a unified representation, but this often introduces inter-frame semantic interference that weakens identity preservation in complex stories. We propose ReDiStory, a training-free framework that improves multi-frame story generation via inference-time prompt embedding reorganization. ReDiStory explicitly decomposes text embeddings into identity-related and frame-specific components, then decorrelates frame embeddings by suppressing shared directions across frames. This reduces cross-frame interference without modifying diffusion parameters or requiring additional supervision. Under identical diffusion backbones and inference settings, ReDiStory improves identity consistency while maintaining prompt fidelity. Experiments on the ConsiStory+ benchmark show consistent gains over 1Prompt1Story on multiple identity consistency metrics. Code is available at: https://github.com/YuZhenyuLindy/ReDiStory
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01303
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation Sarkar, Ayushman Yu, Zhenyu Chen, Chu Tang, Wei Cui, Kangning Idris, Mohd Yamani Idna Computer Vision and Pattern Recognition Generating coherent visual stories requires maintaining subject identity across multiple images while preserving frame-specific semantics. Recent training-free methods concatenate identity and frame prompts into a unified representation, but this often introduces inter-frame semantic interference that weakens identity preservation in complex stories. We propose ReDiStory, a training-free framework that improves multi-frame story generation via inference-time prompt embedding reorganization. ReDiStory explicitly decomposes text embeddings into identity-related and frame-specific components, then decorrelates frame embeddings by suppressing shared directions across frames. This reduces cross-frame interference without modifying diffusion parameters or requiring additional supervision. Under identical diffusion backbones and inference settings, ReDiStory improves identity consistency while maintaining prompt fidelity. Experiments on the ConsiStory+ benchmark show consistent gains over 1Prompt1Story on multiple identity consistency metrics. Code is available at: https://github.com/YuZhenyuLindy/ReDiStory
title	ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.01303

Similar Items