MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Du, Hang, Nan, Guoshun, Qian, Jiawen, Wu, Wangchenhui, Deng, Wendi, Mu, Hanqing, Chen, Zhenyan, Mao, Pengxuan, Tao, Xiaofeng, Liu, Jun
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2412.07183
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916515373121536
author	Du, Hang Nan, Guoshun Qian, Jiawen Wu, Wangchenhui Deng, Wendi Mu, Hanqing Chen, Zhenyan Mao, Pengxuan Tao, Xiaofeng Liu, Jun
author_facet	Du, Hang Nan, Guoshun Qian, Jiawen Wu, Wangchenhui Deng, Wendi Mu, Hanqing Chen, Zhenyan Mao, Pengxuan Tao, Xiaofeng Liu, Jun
contents	Recent advancements in video anomaly understanding (VAU) have opened the door to groundbreaking applications in various fields, such as traffic monitoring and industrial automation. While the current benchmarks in VAU predominantly emphasize the detection and localization of anomalies. Here, we endeavor to delve deeper into the practical aspects of VAU by addressing the essential questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we introduce a comprehensive benchmark for Exploring the Causation of Video Anomalies (ECVA). Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. Specifically, each instance of our ECVA involves three sets of human annotations to indicate "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. Building upon this foundation, we propose a novel prompt-based methodology that serves as a baseline for tackling the intricate challenges posed by ECVA. We utilize "hard prompt" to guide the model to focus on the critical parts related to video anomaly segments, and "soft prompt" to establish temporal and spatial relationships within these anomaly segments. Furthermore, we propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA. This metric leverages the unique features of the ECVA dataset to provide a more comprehensive and reliable assessment of various video large language models. We demonstrate the efficacy of our approach through rigorous experimental analysis and delineate possible avenues for further investigation into the comprehension of video anomaly causation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_07183
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly Du, Hang Nan, Guoshun Qian, Jiawen Wu, Wangchenhui Deng, Wendi Mu, Hanqing Chen, Zhenyan Mao, Pengxuan Tao, Xiaofeng Liu, Jun Computer Vision and Pattern Recognition Artificial Intelligence Recent advancements in video anomaly understanding (VAU) have opened the door to groundbreaking applications in various fields, such as traffic monitoring and industrial automation. While the current benchmarks in VAU predominantly emphasize the detection and localization of anomalies. Here, we endeavor to delve deeper into the practical aspects of VAU by addressing the essential questions: "what anomaly occurred?", "why did it happen?", and "how severe is this abnormal event?". In pursuit of these answers, we introduce a comprehensive benchmark for Exploring the Causation of Video Anomalies (ECVA). Our benchmark is meticulously designed, with each video accompanied by detailed human annotations. Specifically, each instance of our ECVA involves three sets of human annotations to indicate "what", "why" and "how" of an anomaly, including 1) anomaly type, start and end times, and event descriptions, 2) natural language explanations for the cause of an anomaly, and 3) free text reflecting the effect of the abnormality. Building upon this foundation, we propose a novel prompt-based methodology that serves as a baseline for tackling the intricate challenges posed by ECVA. We utilize "hard prompt" to guide the model to focus on the critical parts related to video anomaly segments, and "soft prompt" to establish temporal and spatial relationships within these anomaly segments. Furthermore, we propose AnomEval, a specialized evaluation metric crafted to align closely with human judgment criteria for ECVA. This metric leverages the unique features of the ECVA dataset to provide a more comprehensive and reliable assessment of various video large language models. We demonstrate the efficacy of our approach through rigorous experimental analysis and delineate possible avenues for further investigation into the comprehension of video anomaly causation.
title	Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2412.07183

Documenti analoghi