Progressive Video Summarization via Multimodal Self-supervised Learning

H Li; Q Ke; M Gong; T Drummond

Conference Proceedings

Progressive Video Summarization via Multimodal Self-supervised Learning

H Li, Q Ke, M Gong, T Drummond

Proceedings 2023 IEEE Winter Conference on Applications of Computer Vision Wacv 2023 | Published : 2023

DOI: 10.1109/WACV56688.2023.00554

Abstract

Modern video summarization methods are based on deep neural networks that require a large amount of annotated data for training. However, existing datasets for video summarization are small-scale, easily leading to over-fitting of the deep models. Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task. Specifically, the self-supervised learning is conducted by exploring the semantic consistency between the videos and text in both coarse-grained and fine-grained fashions, as well as recovering masked frames in the videos. The ..

View full abstract

University of Melbourne Researchers

Mingming Gong Author

Tom Drummond Author

Related Projects (1)

A HIGH-PERFORMANCE CLOUD RESOURCE FOR COMPUTATIONAL MODELLING

This project aims to build a relatively low-cost graphical-processing-unit-based cloud-accessible facility. Much current cutting-edge resear..

Grants

Awarded by Australian Research Council

Funding Acknowledgements

This research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. MG was supported by ARC DE210101624.