Saved in:
Bibliographic Details
Main Author: Zhu, Chanhui
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.05933
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Applying Small Language Models (SLMs) to Chinese character-driven generation remains challenging due to data scarcity and the difficulty of disentangling character style. Standard Supervised Fine-Tuning (SFT) often captures surface-level semantics but produces frequent Out-Of-Character (OOC) outputs. We frame this as a controlled sentence-level style rewriting task, which isolates stylistic quality from dialogue context management. We propose a Structured Style-Rewrite Framework that decomposes character style into interpretable format signature, syntactic, and pragmatic dimensions, combined with Chain-of-Thought (CoT) supervision for explicit style planning. A CoT-Shared Direct Preference Optimization (DPO) stage further aligns style planning with surface realization by ensuring preference learning targets output-level style execution rather than reasoning trace differences. Experiments across eight characters from four diverse source domains demonstrate that our method enables a Qwen3-1.7B model to achieve a Valid Style Score of $0.632$ while maintaining strong semantic fidelity (0.878), placing on the Pareto frontier among the evaluated systems and outperforming significantly larger baselines (e.g., GLM-4.7) on consumer hardware.