Salvato in:
Dettagli Bibliografici
Autori principali: Du, Hang, Nan, Guoshun, Qian, Jiawen, Wu, Wangchenhui, Deng, Wendi, Mu, Hanqing, Chen, Zhenyan, Mao, Pengxuan, Tao, Xiaofeng, Liu, Jun
Natura: Preprint
Pubblicazione: 2024
Soggetti:
Accesso online:https://arxiv.org/abs/2412.07183
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866916515373121536
author Du, Hang
Nan, Guoshun
Qian, Jiawen
Wu, Wangchenhui
Deng, Wendi
Mu, Hanqing
Chen, Zhenyan
Mao, Pengxuan
Tao, Xiaofeng
Liu, Jun
author_facet Du, Hang
Nan, Guoshun
Qian, Jiawen
Wu, Wangchenhui
Deng, Wendi
Mu, Hanqing
Chen, Zhenyan
Mao, Pengxuan
Tao, Xiaofeng
Liu, Jun
contents Recent advancements in video anomaly understanding (VAU) have opened the door to groundbreaking applications in various fields, such as traffic monitoring and industrial automation. While the current benchmarks in VAU predominantly emphasize the detection and localization of anomalies. Here, we endeavor to delve deeper into the practical aspects of VAU by addressing the essential questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we introduce a comprehensive benchmark for Exploring the Causation of Video Anomalies (ECVA). Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. Specifically, each instance of our ECVA involves three sets of human annotations to indicate "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. Building upon this foundation, we propose a novel prompt-based methodology that serves as a baseline for tackling the intricate challenges posed by ECVA. We utilize "hard prompt" to guide the model to focus on the critical parts related to video anomaly segments, and "soft prompt" to establish temporal and spatial relationships within these anomaly segments. Furthermore, we propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA. This metric leverages the unique features of the ECVA dataset to provide a more comprehensive and reliable assessment of various video large language models. We demonstrate the efficacy of our approach through rigorous experimental analysis and delineate possible avenues for further investigation into the comprehension of video anomaly causation.
format Preprint
id arxiv_https___arxiv_org_abs_2412_07183
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly
Du, Hang
Nan, Guoshun
Qian, Jiawen
Wu, Wangchenhui
Deng, Wendi
Mu, Hanqing
Chen, Zhenyan
Mao, Pengxuan
Tao, Xiaofeng
Liu, Jun
Computer Vision and Pattern Recognition
Artificial Intelligence
Recent advancements in video anomaly understanding (VAU) have opened the door to groundbreaking applications in various fields, such as traffic monitoring and industrial automation. While the current benchmarks in VAU predominantly emphasize the detection and localization of anomalies. Here, we endeavor to delve deeper into the practical aspects of VAU by addressing the essential questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we introduce a comprehensive benchmark for Exploring the Causation of Video Anomalies (ECVA). Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. Specifically, each instance of our ECVA involves three sets of human annotations to indicate "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. Building upon this foundation, we propose a novel prompt-based methodology that serves as a baseline for tackling the intricate challenges posed by ECVA. We utilize "hard prompt" to guide the model to focus on the critical parts related to video anomaly segments, and "soft prompt" to establish temporal and spatial relationships within these anomaly segments. Furthermore, we propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA. This metric leverages the unique features of the ECVA dataset to provide a more comprehensive and reliable assessment of various video large language models. We demonstrate the efficacy of our approach through rigorous experimental analysis and delineate possible avenues for further investigation into the comprehension of video anomaly causation.
title Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2412.07183