Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Sijing, Li, Yunhao, Xu, Ziwen, Gao, Yixuan, Duan, Huiyu, Sun, Wei, Zhai, Guangtao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.09255
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918154210377728
author	Wu, Sijing Li, Yunhao Xu, Ziwen Gao, Yixuan Duan, Huiyu Sun, Wei Zhai, Guangtao
author_facet	Wu, Sijing Li, Yunhao Xu, Ziwen Gao, Yixuan Duan, Huiyu Sun, Wei Zhai, Guangtao
contents	Face video quality assessment (FVQA) deserves to be explored in addition to general video quality assessment (VQA), as face videos are the primary content on social media platforms and human visual system (HVS) is particularly sensitive to human faces. However, FVQA is rarely explored due to the lack of large-scale FVQA datasets. To fill this gap, we present the first large-scale in-the-wild FVQA dataset, FVQ-20K, which contains 20,000 in-the-wild face videos together with corresponding mean opinion score (MOS) annotations. Along with the FVQ-20K dataset, we further propose a specialized FVQA method named FVQ-Rater to achieve human-like rating and scoring for face video, which is the first attempt to explore the potential of large multimodal models (LMMs) for the FVQA task. Concretely, we elaborately extract multi-dimensional features including spatial features, temporal features, and face-specific features (i.e., portrait features and face embeddings) to provide comprehensive visual information, and take advantage of the LoRA-based instruction tuning technique to achieve quality-specific fine-tuning, which shows superior performance on both FVQ-20K and CFVQA datasets. Extensive experiments and comprehensive analysis demonstrate the significant potential of the FVQ-20K dataset and FVQ-Rater method in promoting the development of FVQA.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_09255
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment Wu, Sijing Li, Yunhao Xu, Ziwen Gao, Yixuan Duan, Huiyu Sun, Wei Zhai, Guangtao Computer Vision and Pattern Recognition Face video quality assessment (FVQA) deserves to be explored in addition to general video quality assessment (VQA), as face videos are the primary content on social media platforms and human visual system (HVS) is particularly sensitive to human faces. However, FVQA is rarely explored due to the lack of large-scale FVQA datasets. To fill this gap, we present the first large-scale in-the-wild FVQA dataset, FVQ-20K, which contains 20,000 in-the-wild face videos together with corresponding mean opinion score (MOS) annotations. Along with the FVQ-20K dataset, we further propose a specialized FVQA method named FVQ-Rater to achieve human-like rating and scoring for face video, which is the first attempt to explore the potential of large multimodal models (LMMs) for the FVQA task. Concretely, we elaborately extract multi-dimensional features including spatial features, temporal features, and face-specific features (i.e., portrait features and face embeddings) to provide comprehensive visual information, and take advantage of the LoRA-based instruction tuning technique to achieve quality-specific fine-tuning, which shows superior performance on both FVQ-20K and CFVQA datasets. Extensive experiments and comprehensive analysis demonstrate the significant potential of the FVQ-20K dataset and FVQ-Rater method in promoting the development of FVQA.
title	FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.09255

Similar Items