Saved in:
Bibliographic Details
Main Authors: Liu, Wenzhuo, Wang, Wenshuo, Qiao, Yicheng, Guo, Qiannan, Zhu, Jiayin, Li, Pengfei, Chen, Zilong, Yang, Huiming, Li, Zhiwei, Wang, Lening, Tan, Tiao, Liu, Huaping
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.02264
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910902538731520
author Liu, Wenzhuo
Wang, Wenshuo
Qiao, Yicheng
Guo, Qiannan
Zhu, Jiayin
Li, Pengfei
Chen, Zilong
Yang, Huiming
Li, Zhiwei
Wang, Lening
Tan, Tiao
Liu, Huaping
author_facet Liu, Wenzhuo
Wang, Wenshuo
Qiao, Yicheng
Guo, Qiannan
Zhu, Jiayin
Li, Pengfei
Chen, Zilong
Yang, Huiming
Li, Zhiwei
Wang, Lening
Tan, Tiao
Liu, Huaping
contents Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.
format Preprint
id arxiv_https___arxiv_org_abs_2504_02264
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Liu, Wenzhuo
Wang, Wenshuo
Qiao, Yicheng
Guo, Qiannan
Zhu, Jiayin
Li, Pengfei
Chen, Zilong
Yang, Huiming
Li, Zhiwei
Wang, Lening
Tan, Tiao
Liu, Huaping
Computer Vision and Pattern Recognition
Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.
title MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2504.02264