Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gupta, Pranav, Krishnan, Advith, Nanda, Naman, Eswar, Ananth, Agarwal, Deeksha, Gohil, Pratham, Goel, Pratyush
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2410.00477
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866916634630815744
author	Gupta, Pranav Krishnan, Advith Nanda, Naman Eswar, Ananth Agarwal, Deeksha Gohil, Pratham Goel, Pratyush
author_facet	Gupta, Pranav Krishnan, Advith Nanda, Naman Eswar, Ananth Agarwal, Deeksha Gohil, Pratham Goel, Pratyush
contents	We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a scale from 0 (no danger to humans) to 10 (life-threatening), with precise timestamps indicating moments of heightened danger. Additionally, we leverage LLMs to independently assess the danger levels in these videos using video summaries. We introduce Mean Squared Error (MSE) scores for multimodal meta-evaluation of the alignment between human and LLM danger assessments. Our dataset not only contributes a new resource for danger assessment in video content but also demonstrates the potential of LLMs in achieving human-like evaluations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_00477
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ViDAS: Vision-based Danger Assessment and Scoring Gupta, Pranav Krishnan, Advith Nanda, Naman Eswar, Ananth Agarwal, Deeksha Gohil, Pratham Goel, Pratyush Computer Vision and Pattern Recognition We present a novel dataset aimed at advancing danger analysis and assessment by addressing the challenge of quantifying danger in video content and identifying how human-like a Large Language Model (LLM) evaluator is for the same. This is achieved by compiling a collection of 100 YouTube videos featuring various events. Each video is annotated by human participants who provided danger ratings on a scale from 0 (no danger to humans) to 10 (life-threatening), with precise timestamps indicating moments of heightened danger. Additionally, we leverage LLMs to independently assess the danger levels in these videos using video summaries. We introduce Mean Squared Error (MSE) scores for multimodal meta-evaluation of the alignment between human and LLM danger assessments. Our dataset not only contributes a new resource for danger assessment in video content but also demonstrates the potential of LLMs in achieving human-like evaluations.
title	ViDAS: Vision-based Danger Assessment and Scoring
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2410.00477

Ähnliche Einträge