Conference Proceedings
Progressive Video Summarization via Multimodal Self-supervised Learning
H Li, Q Ke, M Gong, T Drummond
Proceedings 2023 IEEE Winter Conference on Applications of Computer Vision Wacv 2023 | Published : 2023
Abstract
Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos. The ..
View full abstractGrants
Awarded by Australian Research Council
Funding Acknowledgements
This research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. MG was supported by ARC DE210101624.