Saved in:
Bibliographic Details
Main Authors: Zhou, Ziqi, Yang, Peng, Liang, Yuxin, Liu, Mingliu, Lu, Jia
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.08835
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918492426469376
author Zhou, Ziqi
Yang, Peng
Liang, Yuxin
Liu, Mingliu
Lu, Jia
author_facet Zhou, Ziqi
Yang, Peng
Liang, Yuxin
Liu, Mingliu
Lu, Jia
contents The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent sequences and tunes intra-concurrency decisions to minimize VAE latency while maintaining UNet within high-throughput threshold. Additionally, a feedback controller dynamically adjusts this threshold based on queue loads to boost system capacity ceiling. Experimental results show that, SynerDiff improves throughput by 1.6$\times$ and decreases both average E2E and P99 tail latencies by up to 78.7\%, compared to benchmarks while guaranteeing high image fidelity.
format Preprint
id arxiv_https___arxiv_org_abs_2605_08835
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference
Zhou, Ziqi
Yang, Peng
Liang, Yuxin
Liu, Mingliu
Lu, Jia
Artificial Intelligence
The expansion of Artificial Intelligence-generated content service requires diffusion model serving to simultaneously achieve high throughput and low task end-to-end (E2E) latency. However, existing continuous batching methods suffer from severe resource contention during UNet-VAE concurrency, leading to latency spikes. Furthermore, concurrent multi-task scheduling entails a trade-off between UNet throughput and VAE latency across varying scheduling strategies. To address these, we propose SynerDiff, an efficient continuous batching system built on intra-inter level synergy. At the intra-concurrency level, SynerDiff alleviates resource contention by pruning component-specific resource bottlenecks via VAE Chunking and Adaptive Skip-CFG. At the inter-concurrency level, leveraging components' differential sensitivity to scheduling granularities, a threshold-aware scheduler plans concurrent sequences and tunes intra-concurrency decisions to minimize VAE latency while maintaining UNet within high-throughput threshold. Additionally, a feedback controller dynamically adjusts this threshold based on queue loads to boost system capacity ceiling. Experimental results show that, SynerDiff improves throughput by 1.6$\times$ and decreases both average E2E and P99 tail latencies by up to 78.7\%, compared to benchmarks while guaranteeing high image fidelity.
title SynerDiff: Synergetic Continuous Batching for Fast and Parallel Diffusion Model Inference
topic Artificial Intelligence
url https://arxiv.org/abs/2605.08835