Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xing, Jiazheng, Du, Fei, Yuan, Hangjie, Liu, Pengwei, Xu, Hongbin, Ci, Hai, Niu, Ruigang, Chen, Weihua, Wang, Fan, Liu, Yong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.20192
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908902945193984
author	Xing, Jiazheng Du, Fei Yuan, Hangjie Liu, Pengwei Xu, Hongbin Ci, Hai Niu, Ruigang Chen, Weihua Wang, Fan Liu, Yong
author_facet	Xing, Jiazheng Du, Fei Yuan, Hangjie Liu, Pengwei Xu, Hongbin Ci, Hai Niu, Ruigang Chen, Weihua Wang, Fan Liu, Yong
contents	Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. Addressing this gap requires both explicit modeling strategies and face-attribute-aware data resources. We therefore propose LumosX, a framework that advances both data and model design. On the data side, a tailored collection pipeline orchestrates captions and visual cues from independent videos, while multimodal large language models (MLLMs) infer and assign subject-specific dependencies. These extracted relational priors impose a finer-grained structure that amplifies the expressive control of personalized video generation and enables the construction of a comprehensive benchmark. On the modeling side, Relational Self-Attention and Relational Cross-Attention intertwine position-aware embeddings with refined attention dynamics to inscribe explicit subject-attribute dependencies, enforcing disciplined intra-group cohesion and amplifying the separation between distinct subject clusters. Comprehensive evaluations on our benchmark demonstrate that LumosX achieves state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation. Code and models are available at https://jiazheng-xing.github.io/lumosx-home/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_20192
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation Xing, Jiazheng Du, Fei Yuan, Hangjie Liu, Pengwei Xu, Hongbin Ci, Hai Niu, Ruigang Chen, Weihua Wang, Fan Liu, Yong Computer Vision and Pattern Recognition Artificial Intelligence Recent advances in diffusion models have significantly improved text-to-video generation, enabling personalized content creation with fine-grained control over both foreground and background elements. However, precise face-attribute alignment across subjects remains challenging, as existing methods lack explicit mechanisms to ensure intra-group consistency. Addressing this gap requires both explicit modeling strategies and face-attribute-aware data resources. We therefore propose LumosX, a framework that advances both data and model design. On the data side, a tailored collection pipeline orchestrates captions and visual cues from independent videos, while multimodal large language models (MLLMs) infer and assign subject-specific dependencies. These extracted relational priors impose a finer-grained structure that amplifies the expressive control of personalized video generation and enables the construction of a comprehensive benchmark. On the modeling side, Relational Self-Attention and Relational Cross-Attention intertwine position-aware embeddings with refined attention dynamics to inscribe explicit subject-attribute dependencies, enforcing disciplined intra-group cohesion and amplifying the separation between distinct subject clusters. Comprehensive evaluations on our benchmark demonstrate that LumosX achieves state-of-the-art performance in fine-grained, identity-consistent, and semantically aligned personalized multi-subject video generation. Code and models are available at https://jiazheng-xing.github.io/lumosx-home/.
title	LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2603.20192

Similar Items