Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xie, Wen, Zhu, Yanjun, Overgoor, Gijs, Bart, Yakov, Garcia, Agata Lapedriza, Ostadabbas, Sarah
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Information Retrieval Multimedia 68T05 I.4.0; H.3.1; I.2.10; K.4.4
Online Access:	https://arxiv.org/abs/2510.26569
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909965946454016
author	Xie, Wen Zhu, Yanjun Overgoor, Gijs Bart, Yakov Garcia, Agata Lapedriza Ostadabbas, Sarah
author_facet	Xie, Wen Zhu, Yanjun Overgoor, Gijs Bart, Yakov Garcia, Agata Lapedriza Ostadabbas, Sarah
contents	Advertisers commonly need multiple versions of the same advertisement (ad) at varying durations for a single campaign. The traditional approach involves manually selecting and re-editing shots from longer video ads to create shorter versions, which is labor-intensive and time-consuming. In this paper, we introduce a framework for automated video ad clipping using video summarization techniques. We are the first to frame video clipping as a shot selection problem, tailored specifically for advertising. Unlike existing general video summarization methods that primarily focus on visual content, our approach emphasizes the critical role of audio in advertising. To achieve this, we develop a two-stream audio-visual fusion model that predicts the importance of video frames, where importance is defined as the likelihood of a frame being selected in the firm-produced short ad. To address the lack of ad-specific datasets, we present AdSum204, a novel dataset comprising 102 pairs of 30-second and 15-second ads from real advertising campaigns. Extensive experiments demonstrate that our model outperforms state-of-the-art methods across various metrics, including Average Precision, Area Under Curve, Spearman, and Kendall. The dataset and code are available at https://github.com/ostadabbas/AdSum204.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_26569
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping Xie, Wen Zhu, Yanjun Overgoor, Gijs Bart, Yakov Garcia, Agata Lapedriza Ostadabbas, Sarah Computer Vision and Pattern Recognition Information Retrieval Multimedia 68T05 I.4.0; H.3.1; I.2.10; K.4.4 Advertisers commonly need multiple versions of the same advertisement (ad) at varying durations for a single campaign. The traditional approach involves manually selecting and re-editing shots from longer video ads to create shorter versions, which is labor-intensive and time-consuming. In this paper, we introduce a framework for automated video ad clipping using video summarization techniques. We are the first to frame video clipping as a shot selection problem, tailored specifically for advertising. Unlike existing general video summarization methods that primarily focus on visual content, our approach emphasizes the critical role of audio in advertising. To achieve this, we develop a two-stream audio-visual fusion model that predicts the importance of video frames, where importance is defined as the likelihood of a frame being selected in the firm-produced short ad. To address the lack of ad-specific datasets, we present AdSum204, a novel dataset comprising 102 pairs of 30-second and 15-second ads from real advertising campaigns. Extensive experiments demonstrate that our model outperforms state-of-the-art methods across various metrics, including Average Precision, Area Under Curve, Spearman, and Kendall. The dataset and code are available at https://github.com/ostadabbas/AdSum204.
title	AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping
topic	Computer Vision and Pattern Recognition Information Retrieval Multimedia 68T05 I.4.0; H.3.1; I.2.10; K.4.4
url	https://arxiv.org/abs/2510.26569

Similar Items