Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lyu, Jiahao, Zhao, Minghua, Hu, Jing, Huang, Xuewen, Chen, Yifei, Du, Shuangli
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.21169
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908801249050624
author	Lyu, Jiahao Zhao, Minghua Hu, Jing Huang, Xuewen Chen, Yifei Du, Shuangli
author_facet	Lyu, Jiahao Zhao, Minghua Hu, Jing Huang, Xuewen Chen, Yifei Du, Shuangli
contents	Video anomaly detection (VAD) methods are mostly CNN-based or Transformer-based, achieving impressive results, but the focus on detection accuracy often comes at the expense of inference speed. The emergence of state space models in computer vision, exemplified by the Mamba model, demonstrates improved computational efficiency through selective scans and showcases the great potential for long-range modeling. Our study pioneers the application of Mamba to VAD, dubbed VADMamba, which is based on multi-task learning for frame prediction and optical flow reconstruction. Specifically, we propose the VQ-Mamba Unet (VQ-MaU) framework, which incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block. Furthermore, two individual VQ-MaU networks separately predict frames and reconstruct corresponding optical flows, further boosting accuracy through a clip-level fusion evaluation strategy. Experimental results validate the efficacy of the proposed VADMamba across three benchmark datasets, demonstrating superior performance in inference speed compared to previous work. Code is available at https://github.com/jLooo/VADMamba.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_21169
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	VADMamba: Exploring State Space Models for Fast Video Anomaly Detection Lyu, Jiahao Zhao, Minghua Hu, Jing Huang, Xuewen Chen, Yifei Du, Shuangli Computer Vision and Pattern Recognition Video anomaly detection (VAD) methods are mostly CNN-based or Transformer-based, achieving impressive results, but the focus on detection accuracy often comes at the expense of inference speed. The emergence of state space models in computer vision, exemplified by the Mamba model, demonstrates improved computational efficiency through selective scans and showcases the great potential for long-range modeling. Our study pioneers the application of Mamba to VAD, dubbed VADMamba, which is based on multi-task learning for frame prediction and optical flow reconstruction. Specifically, we propose the VQ-Mamba Unet (VQ-MaU) framework, which incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block. Furthermore, two individual VQ-MaU networks separately predict frames and reconstruct corresponding optical flows, further boosting accuracy through a clip-level fusion evaluation strategy. Experimental results validate the efficacy of the proposed VADMamba across three benchmark datasets, demonstrating superior performance in inference speed compared to previous work. Code is available at https://github.com/jLooo/VADMamba.
title	VADMamba: Exploring State Space Models for Fast Video Anomaly Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.21169

Similar Items