Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bo, Zeyi, Sun, Wuxi, Jin, Ye
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.16195
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910581335785472
author	Bo, Zeyi Sun, Wuxi Jin, Ye
author_facet	Bo, Zeyi Sun, Wuxi Jin, Ye
contents	In recent years, the parameters of backbones of Video Understanding tasks continue to increase and even reach billion-level. Whether fine-tuning a specific task on the Video Foundation Model or pre-training the model designed for the specific task, incurs a lot of overhead. How to make these models play other values than their own tasks becomes a worthy question. Multi-Task Learning(MTL) makes the visual task acquire the rich shareable knowledge from other tasks while joint training. It is fully explored in Image Recognition tasks especially dense predict tasks. Nevertheless, it is rarely used in video domain due to the lack of multi-labels video data. In this paper, a heterogenous data video multi-task prompt learning (VMTL) method is proposed to address above problem. It's different from it in image domain, a Double-Layers Mapper(DLM) is proposed to extract the shareable knowledge into visual promptS and align it with representation of primary task. Extensive experiments prove that our DLM-VMTL performs better than baselines on 6 different video understanding tasks and 11 datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_16195
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning Bo, Zeyi Sun, Wuxi Jin, Ye Computer Vision and Pattern Recognition In recent years, the parameters of backbones of Video Understanding tasks continue to increase and even reach billion-level. Whether fine-tuning a specific task on the Video Foundation Model or pre-training the model designed for the specific task, incurs a lot of overhead. How to make these models play other values than their own tasks becomes a worthy question. Multi-Task Learning(MTL) makes the visual task acquire the rich shareable knowledge from other tasks while joint training. It is fully explored in Image Recognition tasks especially dense predict tasks. Nevertheless, it is rarely used in video domain due to the lack of multi-labels video data. In this paper, a heterogenous data video multi-task prompt learning (VMTL) method is proposed to address above problem. It's different from it in image domain, a Double-Layers Mapper(DLM) is proposed to extract the shareable knowledge into visual promptS and align it with representation of primary task. Extensive experiments prove that our DLM-VMTL performs better than baselines on 6 different video understanding tasks and 11 datasets.
title	DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2408.16195

Similar Items