Saved in:
Bibliographic Details
Main Authors: Zhang, Maojun, Wu, Haotian, Jin, Richeng, Gunduz, Deniz, Mikolajczyk, Krystian
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.05201
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Modern video codecs and learning-based approaches struggle for semantic reconstruction at extremely low bit-rates due to reliance on low-level spatiotemporal redundancies. Generative models, especially diffusion models, offer a new paradigm for video compression by leveraging high-level semantic understanding and powerful visual synthesis. This paper propose a video compression framework that integrates generative priors to drastically reduce bit-rate while maintaining reconstruction fidelity. Specifically, our method compresses high-level semantic representations of the video, then uses a conditional diffusion model to reconstruct frames from these semantics. To further improve compression, we characterize motion information with global camera trajectories and foreground segmentation: background motion is compactly represented by camera pose parameters while foreground dynamics by sparse segmentation masks. This allows for significantly boosts compression efficiency, enabling descent video reconstruction at extremely low bit-rates.