Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Parmar, Paritosh, Peh, Eric, Chen, Ruirui, Lam, Ting En, Chen, Yuhan, Tan, Elston, Fernando, Basura
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2404.01299
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911917598048256
author	Parmar, Paritosh Peh, Eric Chen, Ruirui Lam, Ting En Chen, Yuhan Tan, Elston Fernando, Basura
author_facet	Parmar, Paritosh Peh, Eric Chen, Ruirui Lam, Ting En Chen, Yuhan Tan, Elston Fernando, Basura
contents	Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_01299
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes Parmar, Paritosh Peh, Eric Chen, Ruirui Lam, Ting En Chen, Yuhan Tan, Elston Fernando, Basura Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.
title	CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
topic	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
url	https://arxiv.org/abs/2404.01299

Similar Items