Saved in:
| Main Authors: | , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.02264 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910902538731520 |
|---|---|
| author | Liu, Wenzhuo Wang, Wenshuo Qiao, Yicheng Guo, Qiannan Zhu, Jiayin Li, Pengfei Chen, Zilong Yang, Huiming Li, Zhiwei Wang, Lening Tan, Tiao Liu, Huaping |
| author_facet | Liu, Wenzhuo Wang, Wenshuo Qiao, Yicheng Guo, Qiannan Zhu, Jiayin Li, Pengfei Chen, Zilong Yang, Huiming Li, Zhiwei Wang, Lening Tan, Tiao Liu, Huaping |
| contents | Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_02264 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception Liu, Wenzhuo Wang, Wenshuo Qiao, Yicheng Guo, Qiannan Zhu, Jiayin Li, Pengfei Chen, Zilong Yang, Huiming Li, Zhiwei Wang, Lening Tan, Tiao Liu, Huaping Computer Vision and Pattern Recognition Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available on https://github.com/Wenzhuo-Liu/MMTL-UniAD. |
| title | MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2504.02264 |