Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gupta, Pranav, Krishnan, Advith, Nanda, Naman, Eswar, Ananth, Agarwal, Deeksha, Gohil, Pratham, Goel, Pratyush
Format: Preprint
Veröffentlicht: 2024
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2410.00477
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866916634630815744
author Gupta, Pranav
Krishnan, Advith
Nanda, Naman
Eswar, Ananth
Agarwal, Deeksha
Gohil, Pratham
Goel, Pratyush
author_facet Gupta, Pranav
Krishnan, Advith
Nanda, Naman
Eswar, Ananth
Agarwal, Deeksha
Gohil, Pratham
Goel, Pratyush
contents We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a scale from 0 (no danger to humans) to 10 (life-threatening), with precise timestamps indicating moments of heightened danger. Additionally, we leverage LLMs to independently assess the danger levels in these videos using video summaries. We introduce Mean Squared Error (MSE) scores for multimodal meta-evaluation of the alignment between human and LLM danger assessments. Our dataset not only contributes a new resource for danger assessment in video content but also demonstrates the potential of LLMs in achieving human-like evaluations.
format Preprint
id arxiv_https___arxiv_org_abs_2410_00477
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle ViDAS: Vision-based Danger Assessment and Scoring
Gupta, Pranav
Krishnan, Advith
Nanda, Naman
Eswar, Ananth
Agarwal, Deeksha
Gohil, Pratham
Goel, Pratyush
Computer Vision and Pattern Recognition
We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a scale from 0 (no danger to humans) to 10 (life-threatening), with precise timestamps indicating moments of heightened danger. Additionally, we leverage LLMs to independently assess the danger levels in these videos using video summaries. We introduce Mean Squared Error (MSE) scores for multimodal meta-evaluation of the alignment between human and LLM danger assessments. Our dataset not only contributes a new resource for danger assessment in video content but also demonstrates the potential of LLMs in achieving human-like evaluations.
title ViDAS: Vision-based Danger Assessment and Scoring
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2410.00477