Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Wenzhuo, Wang, Wenshuo, Qiao, Yicheng, Guo, Qiannan, Zhu, Jiayin, Li, Pengfei, Chen, Zilong, Yang, Huiming, Li, Zhiwei, Wang, Lening, Tan, Tiao, Liu, Huaping
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.02264
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910902538731520
author	Liu, Wenzhuo Wang, Wenshuo Qiao, Yicheng Guo, Qiannan Zhu, Jiayin Li, Pengfei Chen, Zilong Yang, Huiming Li, Zhiwei Wang, Lening Tan, Tiao Liu, Huaping
author_facet	Liu, Wenzhuo Wang, Wenshuo Qiao, Yicheng Guo, Qiannan Zhu, Jiayin Li, Pengfei Chen, Zilong Yang, Huiming Li, Zhiwei Wang, Lening Tan, Tiao Liu, Huaping
contents	Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_02264
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception Liu, Wenzhuo Wang, Wenshuo Qiao, Yicheng Guo, Qiannan Zhu, Jiayin Li, Pengfei Chen, Zilong Yang, Huiming Li, Zhiwei Wang, Lening Tan, Tiao Liu, Huaping Computer Vision and Pattern Recognition Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD.
title	MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.02264

Similar Items