Saved in:
Bibliographic Details
Main Authors: Xie, Wen, Zhu, Yanjun, Overgoor, Gijs, Bart, Yakov, Garcia, Agata Lapedriza, Ostadabbas, Sarah
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.26569
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909965946454016
author Xie, Wen
Zhu, Yanjun
Overgoor, Gijs
Bart, Yakov
Garcia, Agata Lapedriza
Ostadabbas, Sarah
author_facet Xie, Wen
Zhu, Yanjun
Overgoor, Gijs
Bart, Yakov
Garcia, Agata Lapedriza
Ostadabbas, Sarah
contents Advertisers commonly need multiple versions of the same advertisement (ad) at varying durations for a single campaign. The traditional approach involves manually selecting and re-editing shots from longer video ads to create shorter versions, which is labor-intensive and time-consuming. In this paper, we introduce a framework for automated video ad clipping using video summarization techniques. We are the first to frame video clipping as a shot selection problem, tailored specifically for advertising. Unlike existing general video summarization methods that primarily focus on visual content, our approach emphasizes the critical role of audio in advertising. To achieve this, we develop a two-stream audio-visual fusion model that predicts the importance of video frames, where importance is defined as the likelihood of a frame being selected in the firm-produced short ad. To address the lack of ad-specific datasets, we present AdSum204, a novel dataset comprising 102 pairs of 30-second and 15-second ads from real advertising campaigns. Extensive experiments demonstrate that our model outperforms state-of-the-art methods across various metrics, including Average Precision, Area Under Curve, Spearman, and Kendall. The dataset and code are available at https://github.com/ostadabbas/AdSum204.
format Preprint
id arxiv_https___arxiv_org_abs_2510_26569
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping
Xie, Wen
Zhu, Yanjun
Overgoor, Gijs
Bart, Yakov
Garcia, Agata Lapedriza
Ostadabbas, Sarah
Computer Vision and Pattern Recognition
Information Retrieval
Multimedia
68T05
I.4.0; H.3.1; I.2.10; K.4.4
Advertisers commonly need multiple versions of the same advertisement (ad) at varying durations for a single campaign. The traditional approach involves manually selecting and re-editing shots from longer video ads to create shorter versions, which is labor-intensive and time-consuming. In this paper, we introduce a framework for automated video ad clipping using video summarization techniques. We are the first to frame video clipping as a shot selection problem, tailored specifically for advertising. Unlike existing general video summarization methods that primarily focus on visual content, our approach emphasizes the critical role of audio in advertising. To achieve this, we develop a two-stream audio-visual fusion model that predicts the importance of video frames, where importance is defined as the likelihood of a frame being selected in the firm-produced short ad. To address the lack of ad-specific datasets, we present AdSum204, a novel dataset comprising 102 pairs of 30-second and 15-second ads from real advertising campaigns. Extensive experiments demonstrate that our model outperforms state-of-the-art methods across various metrics, including Average Precision, Area Under Curve, Spearman, and Kendall. The dataset and code are available at https://github.com/ostadabbas/AdSum204.
title AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping
topic Computer Vision and Pattern Recognition
Information Retrieval
Multimedia
68T05
I.4.0; H.3.1; I.2.10; K.4.4
url https://arxiv.org/abs/2510.26569