Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Jiaqing, Yin, Mingjia, Wang, Hao, Tian, Yuxin, Ye, Yuyang, Li, Yawen, Guo, Wei, Liu, Yong, Chen, Enhong
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.22743
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918418724159488
author	Zhang, Jiaqing Yin, Mingjia Wang, Hao Tian, Yuxin Ye, Yuyang Li, Yawen Guo, Wei Liu, Yong Chen, Enhong
author_facet	Zhang, Jiaqing Yin, Mingjia Wang, Hao Tian, Yuxin Ye, Yuyang Li, Yawen Guo, Wei Liu, Yong Chen, Enhong
contents	Recommendation model performance is intrinsically tied to the quality, volume, and relevance of their training data. To address common challenges like data sparsity and cold start, recent researchs have leveraged data from multiple auxiliary domains to enrich information within the target domain. However, inherent domain gaps can degrade the quality of mixed-domain data, leading to negative transfer and diminished model performance. Existing prevailing \emph{model-centric} paradigm -- which relies on complex, customized architectures -- struggles to capture the subtle, non-structural sequence dependencies across domains, leading to poor generalization and high demands on computational resources. To address these shortcomings, we propose \textsc{Taesar}, a \emph{data-centric} framework for \textbf{t}arget-\textbf{a}lign\textbf{e}d \textbf{s}equenti\textbf{a}l \textbf{r}egeneration, which employs a contrastive decoding mechanism to adaptively encode cross-domain context into target-domain sequences. It employs contrastive decoding to encode cross-domain context into target sequences, enabling standard models to learn intricate dependencies without complex fusion architectures. Experiments show \textsc{Taesar} outperforms model-centric solutions and generalizes to various sequential models. By generating enriched datasets, \textsc{Taesar} effectively combines the strengths of data- and model-centric paradigms. The code accompanying this paper is available at~ \textcolor{blue}{https://github.com/USTC-StarTeam/Taesar}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_22743
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Generative Data Transformation: From Mixed to Unified Data Zhang, Jiaqing Yin, Mingjia Wang, Hao Tian, Yuxin Ye, Yuyang Li, Yawen Guo, Wei Liu, Yong Chen, Enhong Artificial Intelligence Recommendation model performance is intrinsically tied to the quality, volume, and relevance of their training data. To address common challenges like data sparsity and cold start, recent researchs have leveraged data from multiple auxiliary domains to enrich information within the target domain. However, inherent domain gaps can degrade the quality of mixed-domain data, leading to negative transfer and diminished model performance. Existing prevailing \emph{model-centric} paradigm -- which relies on complex, customized architectures -- struggles to capture the subtle, non-structural sequence dependencies across domains, leading to poor generalization and high demands on computational resources. To address these shortcomings, we propose \textsc{Taesar}, a \emph{data-centric} framework for \textbf{t}arget-\textbf{a}lign\textbf{e}d \textbf{s}equenti\textbf{a}l \textbf{r}egeneration, which employs a contrastive decoding mechanism to adaptively encode cross-domain context into target-domain sequences. It employs contrastive decoding to encode cross-domain context into target sequences, enabling standard models to learn intricate dependencies without complex fusion architectures. Experiments show \textsc{Taesar} outperforms model-centric solutions and generalizes to various sequential models. By generating enriched datasets, \textsc{Taesar} effectively combines the strengths of data- and model-centric paradigms. The code accompanying this paper is available at~ \textcolor{blue}{https://github.com/USTC-StarTeam/Taesar}.
title	Generative Data Transformation: From Mixed to Unified Data
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.22743

Similar Items