Saved in:
Bibliographic Details
Main Authors: Prasad, Ashish, Jeevan, Pranav, Sethi, Amit
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.14724
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929509903630336
author Prasad, Ashish
Jeevan, Pranav
Sethi, Amit
author_facet Prasad, Ashish
Jeevan, Pranav
Sethi, Amit
contents Current video summarization methods largely rely on transformer-based architectures, which, due to their quadratic complexity, require substantial computational resources. In this work, we address these inefficiencies by enhancing the Direct-to-Summarize Network (DSNet) with more resource-efficient token mixing mechanisms. We show that replacing traditional attention with alternatives like Fourier, Wavelet transforms, and Nyströmformer improves efficiency and performance. Furthermore, we explore various pooling strategies within the Regional Proposal Network, including ROI pooling, Fast Fourier Transform pooling, and flat pooling. Our experimental results on TVSum and SumMe datasets demonstrate that these modifications significantly reduce computational costs while maintaining competitive summarization performance. Thus, our work offers a more scalable solution for video summarization tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2409_14724
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle EDSNet: Efficient-DSNet for Video Summarization
Prasad, Ashish
Jeevan, Pranav
Sethi, Amit
Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
I.4.10; I.4.0; I.4.9; I.2.10
Current video summarization methods largely rely on transformer-based architectures, which, due to their quadratic complexity, require substantial computational resources. In this work, we address these inefficiencies by enhancing the Direct-to-Summarize Network (DSNet) with more resource-efficient token mixing mechanisms. We show that replacing traditional attention with alternatives like Fourier, Wavelet transforms, and Nyströmformer improves efficiency and performance. Furthermore, we explore various pooling strategies within the Regional Proposal Network, including ROI pooling, Fast Fourier Transform pooling, and flat pooling. Our experimental results on TVSum and SumMe datasets demonstrate that these modifications significantly reduce computational costs while maintaining competitive summarization performance. Thus, our work offers a more scalable solution for video summarization tasks.
title EDSNet: Efficient-DSNet for Video Summarization
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
I.4.10; I.4.0; I.4.9; I.2.10
url https://arxiv.org/abs/2409.14724