Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Ziqi, Yang, Peng, Liang, Yuxin, Liu, Mingliu, Lu, Jia
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.08835
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918492426469376
author	Zhou, Ziqi Yang, Peng Liang, Yuxin Liu, Mingliu Lu, Jia
author_facet	Zhou, Ziqi Yang, Peng Liang, Yuxin Liu, Mingliu Lu, Jia
contents	The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent sequences and tunes intra-concurrency decisions to minimize VAE latency while maintaining UNet within high-throughput threshold. Additionally, a feedback controller dynamically adjusts this threshold based on queue loads to boost system capacity ceiling. Experimental results show that, SynerDiff improves throughput by 1.6$\times$ and decreases both average E2E and P99 tail latencies by up to 78.7\%, compared to benchmarks while guaranteeing high image fidelity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_08835
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference Zhou, Ziqi Yang, Peng Liang, Yuxin Liu, Mingliu Lu, Jia Artificial Intelligence The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent sequences and tunes intra-concurrency decisions to minimize VAE latency while maintaining UNet within high-throughput threshold. Additionally, a feedback controller dynamically adjusts this threshold based on queue loads to boost system capacity ceiling. Experimental results show that, SynerDiff improves throughput by 1.6$\times$ and decreases both average E2E and P99 tail latencies by up to 78.7\%, compared to benchmarks while guaranteeing high image fidelity.
title	SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.08835

Similar Items