Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Wenzhuo, Guo, Qiannan, Wang, Zhen, Wang, Wenshuo, Yang, Lei, Qiao, Yicheng, Wang, Lening, Li, Zhiwei, Lv, Chen, Zhang, Shanghang, Xi, Junqiang, Liu, Huaping
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01594
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914300664217600
author	Liu, Wenzhuo Guo, Qiannan Wang, Zhen Wang, Wenshuo Yang, Lei Qiao, Yicheng Wang, Lening Li, Zhiwei Lv, Chen Zhang, Shanghang Xi, Junqiang Liu, Huaping
author_facet	Liu, Wenzhuo Guo, Qiannan Wang, Zhen Wang, Wenshuo Yang, Lei Qiao, Yicheng Wang, Lening Li, Zhiwei Lv, Chen Zhang, Shanghang Xi, Junqiang Liu, Huaping
contents	Advanced Driver Assistance Systems (ADAS) need to understand human driver behavior while perceiving their navigation context, but jointly learning these heterogeneous tasks would cause inter-task negative transfer and impair system performance. Here, we propose a Unified and Versatile Multimodal Multi-Task Learning (UV-M3TL) framework to simultaneously recognize driver behavior, driver emotion, vehicle behavior, and traffic context, while mitigating inter-task negative transfer. Our framework incorporates two core components: dual-branch spatial channel multimodal embedding (DB-SCME) and adaptive feature-decoupled multi-task loss (AFD-Loss). DB-SCME enhances cross-task knowledge transfer while mitigating task conflicts by employing a dual-branch structure to explicitly model salient task-shared and task-specific features. AFD-Loss improves the stability of joint optimization while guiding the model to learn diverse multi-task representations by introducing an adaptive weighting mechanism based on learning dynamics and feature decoupling constraints. We evaluate our method on the AIDE dataset, and the experimental results demonstrate that UV-M3TL achieves state-of-the-art performance across all four tasks. To further prove the versatility, we evaluate UV-M3TL on additional public multi-task perception benchmarks (BDD100K, CityScapes, NYUD-v2, and PASCAL-Context), where it consistently delivers strong performance across diverse task combinations, attaining state-of-the-art results on most tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01594
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception Liu, Wenzhuo Guo, Qiannan Wang, Zhen Wang, Wenshuo Yang, Lei Qiao, Yicheng Wang, Lening Li, Zhiwei Lv, Chen Zhang, Shanghang Xi, Junqiang Liu, Huaping Computer Vision and Pattern Recognition Advanced Driver Assistance Systems (ADAS) need to understand human driver behavior while perceiving their navigation context, but jointly learning these heterogeneous tasks would cause inter-task negative transfer and impair system performance. Here, we propose a Unified and Versatile Multimodal Multi-Task Learning (UV-M3TL) framework to simultaneously recognize driver behavior, driver emotion, vehicle behavior, and traffic context, while mitigating inter-task negative transfer. Our framework incorporates two core components: dual-branch spatial channel multimodal embedding (DB-SCME) and adaptive feature-decoupled multi-task loss (AFD-Loss). DB-SCME enhances cross-task knowledge transfer while mitigating task conflicts by employing a dual-branch structure to explicitly model salient task-shared and task-specific features. AFD-Loss improves the stability of joint optimization while guiding the model to learn diverse multi-task representations by introducing an adaptive weighting mechanism based on learning dynamics and feature decoupling constraints. We evaluate our method on the AIDE dataset, and the experimental results demonstrate that UV-M3TL achieves state-of-the-art performance across all four tasks. To further prove the versatility, we evaluate UV-M3TL on additional public multi-task perception benchmarks (BDD100K, CityScapes, NYUD-v2, and PASCAL-Context), where it consistently delivers strong performance across diverse task combinations, attaining state-of-the-art results on most tasks.
title	UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.01594

Similar Items