Saved in:
Bibliographic Details
Main Authors: Wang, Feng, Ruan, Haihang, Xie, Zhihuang, Wang, Ronggang, Yue, Xiangyu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.07645
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929382582386688
author Wang, Feng
Ruan, Haihang
Xie, Zhihuang
Wang, Ronggang
Yue, Xiangyu
author_facet Wang, Feng
Ruan, Haihang
Xie, Zhihuang
Wang, Ronggang
Yue, Xiangyu
contents Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.
format Preprint
id arxiv_https___arxiv_org_abs_2406_07645
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information
Wang, Feng
Ruan, Haihang
Xie, Zhihuang
Wang, Ronggang
Yue, Xiangyu
Computer Vision and Pattern Recognition
Multimedia
Recently, Neural Video Compression (NVC) techniques have achieved remarkable performance, even surpassing the best traditional lossy video codec. However, most existing NVC methods heavily rely on transmitting Motion Vector (MV) to generate accurate contextual features, which has the following drawbacks. (1) Compressing and transmitting MV requires specialized MV encoder and decoder, which makes modules redundant. (2) Due to the existence of MV Encoder-Decoder, the training strategy is complex. In this paper, we present a noval Single Stream NVC framework (SSNVC), which removes complex MV Encoder-Decoder structure and uses a one-stage training strategy. SSNVC implicitly use temporal information by adding previous entropy model feature to current entropy model and using previous two frame to generate predicted motion information at the decoder side. Besides, we enhance the frame generator to generate higher quality reconstructed frame. Experiments demonstrate that SSNVC can achieve state-of-the-art performance on multiple benchmarks, and can greatly simplify compression process as well as training process.
title SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information
topic Computer Vision and Pattern Recognition
Multimedia
url https://arxiv.org/abs/2406.07645