Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yan, Jiebin, Wu, Lei, Fang, Yuming, Liu, Xuelin, Xia, Xue, Liu, Weide
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.07087
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916564340572160
author	Yan, Jiebin Wu, Lei Fang, Yuming Liu, Xuelin Xia, Xue Liu, Weide
author_facet	Yan, Jiebin Wu, Lei Fang, Yuming Liu, Xuelin Xia, Xue Liu, Weide
contents	With the rapid development of multimedia processing and deep learning technologies, especially in the field of video understanding, video quality assessment (VQA) has achieved significant progress. Although researchers have moved from designing efficient video quality mapping models to various research directions, in-depth exploration of the effectiveness-efficiency trade-offs of spatio-temporal modeling in VQA models is still less sufficient. Considering the fact that videos have highly redundant information, this paper investigates this problem from the perspective of joint spatial and temporal sampling, aiming to seek the answer to how little information we should keep at least when feeding videos into the VQA models while with acceptable performance sacrifice. To this end, we drastically sample the video's information from both spatial and temporal dimensions, and the heavily squeezed video is then fed into a stable VQA model. Comprehensive experiments regarding joint spatial and temporal sampling are conducted on six public video quality databases, and the results demonstrate the acceptable performance of the VQA model when throwing away most of the video information. Furthermore, with the proposed joint spatial and temporal sampling strategy, we make an initial attempt to design an online VQA model, which is instantiated by as simple as possible a spatial feature extractor, a temporal feature fusion module, and a global quality regression module. Through quantitative and qualitative experiments, we verify the feasibility of online VQA model by simplifying itself and reducing input.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_07087
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling Yan, Jiebin Wu, Lei Fang, Yuming Liu, Xuelin Xia, Xue Liu, Weide Computer Vision and Pattern Recognition Artificial Intelligence With the rapid development of multimedia processing and deep learning technologies, especially in the field of video understanding, video quality assessment (VQA) has achieved significant progress. Although researchers have moved from designing efficient video quality mapping models to various research directions, in-depth exploration of the effectiveness-efficiency trade-offs of spatio-temporal modeling in VQA models is still less sufficient. Considering the fact that videos have highly redundant information, this paper investigates this problem from the perspective of joint spatial and temporal sampling, aiming to seek the answer to how little information we should keep at least when feeding videos into the VQA models while with acceptable performance sacrifice. To this end, we drastically sample the video's information from both spatial and temporal dimensions, and the heavily squeezed video is then fed into a stable VQA model. Comprehensive experiments regarding joint spatial and temporal sampling are conducted on six public video quality databases, and the results demonstrate the acceptable performance of the VQA model when throwing away most of the video information. Furthermore, with the proposed joint spatial and temporal sampling strategy, we make an initial attempt to design an online VQA model, which is instantiated by as simple as possible a spatial feature extractor, a temporal feature fusion module, and a global quality regression module. Through quantitative and qualitative experiments, we verify the feasibility of online VQA model by simplifying itself and reducing input.
title	Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2501.07087

Similar Items