Saved in:
Bibliographic Details
Main Authors: Liu, Wenzhuo, Guo, Qiannan, Wang, Zhen, Wang, Wenshuo, Yang, Lei, Qiao, Yicheng, Wang, Lening, Li, Zhiwei, Lv, Chen, Zhang, Shanghang, Xi, Junqiang, Liu, Huaping
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.01594
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914300664217600
author Liu, Wenzhuo
Guo, Qiannan
Wang, Zhen
Wang, Wenshuo
Yang, Lei
Qiao, Yicheng
Wang, Lening
Li, Zhiwei
Lv, Chen
Zhang, Shanghang
Xi, Junqiang
Liu, Huaping
author_facet Liu, Wenzhuo
Guo, Qiannan
Wang, Zhen
Wang, Wenshuo
Yang, Lei
Qiao, Yicheng
Wang, Lening
Li, Zhiwei
Lv, Chen
Zhang, Shanghang
Xi, Junqiang
Liu, Huaping
contents Advanced Driver Assistance Systems (ADAS) need to understand human driver behavior while perceiving their navigation context, but jointly learning these heterogeneous tasks would cause inter-task negative transfer and impair system performance. Here, we propose a Unified and Versatile Multimodal Multi-Task Learning (UV-M3TL) framework to simultaneously recognize driver behavior, driver emotion, vehicle behavior, and traffic context, while mitigating inter-task negative transfer. Our framework incorporates two core components: dual-branch spatial channel multimodal embedding (DB-SCME) and adaptive feature-decoupled multi-task loss (AFD-Loss). DB-SCME enhances cross-task knowledge transfer while mitigating task conflicts by employing a dual-branch structure to explicitly model salient task-shared and task-specific features. AFD-Loss improves the stability of joint optimization while guiding the model to learn diverse multi-task representations by introducing an adaptive weighting mechanism based on learning dynamics and feature decoupling constraints. We evaluate our method on the AIDE dataset, and the experimental results demonstrate that UV-M3TL achieves state-of-the-art performance across all four tasks. To further prove the versatility, we evaluate UV-M3TL on additional public multi-task perception benchmarks (BDD100K, CityScapes, NYUD-v2, and PASCAL-Context), where it consistently delivers strong performance across diverse task combinations, attaining state-of-the-art results on most tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2602_01594
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception
Liu, Wenzhuo
Guo, Qiannan
Wang, Zhen
Wang, Wenshuo
Yang, Lei
Qiao, Yicheng
Wang, Lening
Li, Zhiwei
Lv, Chen
Zhang, Shanghang
Xi, Junqiang
Liu, Huaping
Computer Vision and Pattern Recognition
Advanced Driver Assistance Systems (ADAS) need to understand human driver behavior while perceiving their navigation context, but jointly learning these heterogeneous tasks would cause inter-task negative transfer and impair system performance. Here, we propose a Unified and Versatile Multimodal Multi-Task Learning (UV-M3TL) framework to simultaneously recognize driver behavior, driver emotion, vehicle behavior, and traffic context, while mitigating inter-task negative transfer. Our framework incorporates two core components: dual-branch spatial channel multimodal embedding (DB-SCME) and adaptive feature-decoupled multi-task loss (AFD-Loss). DB-SCME enhances cross-task knowledge transfer while mitigating task conflicts by employing a dual-branch structure to explicitly model salient task-shared and task-specific features. AFD-Loss improves the stability of joint optimization while guiding the model to learn diverse multi-task representations by introducing an adaptive weighting mechanism based on learning dynamics and feature decoupling constraints. We evaluate our method on the AIDE dataset, and the experimental results demonstrate that UV-M3TL achieves state-of-the-art performance across all four tasks. To further prove the versatility, we evaluate UV-M3TL on additional public multi-task perception benchmarks (BDD100K, CityScapes, NYUD-v2, and PASCAL-Context), where it consistently delivers strong performance across diverse task combinations, attaining state-of-the-art results on most tasks.
title UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.01594