Enregistré dans:
Détails bibliographiques
Auteurs principaux: Kavathia, Aarjav, Sayer, Simeon
Format: Preprint
Publié: 2024
Sujets:
Accès en ligne:https://arxiv.org/abs/2411.01348
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866912102831095808
author Kavathia, Aarjav
Sayer, Simeon
author_facet Kavathia, Aarjav
Sayer, Simeon
contents As violent crimes continue to happen, it becomes necessary to have security cameras that can rapidly identify moments of violence with excellent accuracy. The purpose of this study is to identify how many frames should be analyzed at a time in order to optimize a violence detection model's accuracy as a parameter of the depth of a 3D convolutional network. Previous violence classification models have been created, but their application to live footage may be flawed. In this project, a convolutional neural network was created to analyze optical flow frames of each video. The number of frames analyzed at a time would vary with one, two, three, ten, and twenty frames, and each model would be trained for 20 epochs. The greatest validation accuracy was 94.87% and occurred with the model that analyzed three frames at a time. This means that machine learning models to detect violence may function better when analyzing three frames at a time for this dataset. The methodology used to identify the optimal number of frames to analyze at a time could be used in other applications of video classification, especially those of complex or abstract actions, such as violence.
format Preprint
id arxiv_https___arxiv_org_abs_2411_01348
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Optimizing Violence Detection in Video Classification Accuracy through 3D Convolutional Neural Networks
Kavathia, Aarjav
Sayer, Simeon
Computer Vision and Pattern Recognition
As violent crimes continue to happen, it becomes necessary to have security cameras that can rapidly identify moments of violence with excellent accuracy. The purpose of this study is to identify how many frames should be analyzed at a time in order to optimize a violence detection model's accuracy as a parameter of the depth of a 3D convolutional network. Previous violence classification models have been created, but their application to live footage may be flawed. In this project, a convolutional neural network was created to analyze optical flow frames of each video. The number of frames analyzed at a time would vary with one, two, three, ten, and twenty frames, and each model would be trained for 20 epochs. The greatest validation accuracy was 94.87% and occurred with the model that analyzed three frames at a time. This means that machine learning models to detect violence may function better when analyzing three frames at a time for this dataset. The methodology used to identify the optimal number of frames to analyze at a time could be used in other applications of video classification, especially those of complex or abstract actions, such as violence.
title Optimizing Violence Detection in Video Classification Accuracy through 3D Convolutional Neural Networks
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2411.01348