Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yuting, Sun, Ying, Shen, Dazhong, Xie, Ziwei, Liu, Feng, Zhang, Changwang, Liu, Xiang, Wang, Jun, Xiong, Hui
Format:	Preprint
Published:	2026
Subjects:	Information Retrieval
Online Access:	https://arxiv.org/abs/2604.20434
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913054737825792
author	Zhang, Yuting Sun, Ying Shen, Dazhong Xie, Ziwei Liu, Feng Zhang, Changwang Liu, Xiang Wang, Jun Xiong, Hui
author_facet	Zhang, Yuting Sun, Ying Shen, Dazhong Xie, Ziwei Liu, Feng Zhang, Changwang Liu, Xiang Wang, Jun Xiong, Hui
contents	The emergence of generative models enables the creation of texts and images tailored to users' preferences. Existing personalized generative models have two critical limitations: lacking a dedicated paradigm for accurate preference modeling, and generating unimodal content despite real-world multimodal-driven user interactions. Therefore, we propose personalized multimodal generation, which captures modal-specific preferences via a dedicated preference model from multimodal interactions, and then feeds them into downstream generators for personalized multimodal content. However, this task presents two challenges: (1) Gap between continuous preferences from dedicated modeling and discrete token inputs intrinsic to generator architectures; (2) Potential inconsistency between generated images and texts. To tackle these, we present a two-stage framework called Discrete Preference learning for Personalized Multimodal Generation (DPPMG). In the first stage, to accurately learn discrete modal-specific preferences, we introduce a modal-specific graph neural network (a dedicated preference model) to learn users' modal-specific preferences, which preferences are then quantized into discrete preference tokens. In the second stage, the discrete modal-specific preference tokens are injected into downstream text and image generators. To further enhance cross-modal consistency while preserving personalization, we design a cross-modal consistent and personalized reward to fine-tune token-associated parameters. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model in generating personalized and consistent multimodal content.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_20434
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Discrete Preference Learning for Personalized Multimodal Generation Zhang, Yuting Sun, Ying Shen, Dazhong Xie, Ziwei Liu, Feng Zhang, Changwang Liu, Xiang Wang, Jun Xiong, Hui Information Retrieval The emergence of generative models enables the creation of texts and images tailored to users' preferences. Existing personalized generative models have two critical limitations: lacking a dedicated paradigm for accurate preference modeling, and generating unimodal content despite real-world multimodal-driven user interactions. Therefore, we propose personalized multimodal generation, which captures modal-specific preferences via a dedicated preference model from multimodal interactions, and then feeds them into downstream generators for personalized multimodal content. However, this task presents two challenges: (1) Gap between continuous preferences from dedicated modeling and discrete token inputs intrinsic to generator architectures; (2) Potential inconsistency between generated images and texts. To tackle these, we present a two-stage framework called Discrete Preference learning for Personalized Multimodal Generation (DPPMG). In the first stage, to accurately learn discrete modal-specific preferences, we introduce a modal-specific graph neural network (a dedicated preference model) to learn users' modal-specific preferences, which preferences are then quantized into discrete preference tokens. In the second stage, the discrete modal-specific preference tokens are injected into downstream text and image generators. To further enhance cross-modal consistency while preserving personalization, we design a cross-modal consistent and personalized reward to fine-tune token-associated parameters. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model in generating personalized and consistent multimodal content.
title	Discrete Preference Learning for Personalized Multimodal Generation
topic	Information Retrieval
url	https://arxiv.org/abs/2604.20434

Similar Items