MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Cao, Stanley, Young, Sonny
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2407.18949
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866910544629334016
author	Cao, Stanley Young, Sonny
author_facet	Cao, Stanley Young, Sonny
contents	Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wit and humor of winning entries in the New Yorker Cartoon Caption Contest. This task necessitates sophisticated visual and linguistic processing, along with an understanding of cultural nuances and humor. We propose several new baselines for using vision transformer encoder-decoder models to generate captions for the New Yorker cartoon caption contest.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_18949
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Predicting Winning Captions for Weekly New Yorker Comics Cao, Stanley Young, Sonny Computer Vision and Pattern Recognition Artificial Intelligence Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wit and humor of winning entries in the New Yorker Cartoon Caption Contest. This task necessitates sophisticated visual and linguistic processing, along with an understanding of cultural nuances and humor. We propose several new baselines for using vision transformer encoder-decoder models to generate captions for the New Yorker cartoon caption contest.
title	Predicting Winning Captions for Weekly New Yorker Comics
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2407.18949

Documenti analoghi