Virginia Tech® home

DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization

Ismini Lourentzou

Abstract

The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content. Query-aware multi-video summarization is a promising technique that caters to this demand. In this work, we introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS, that jointly optimizes multiple criteria: (1) conciseness, (2) representativeness of important query-relevant events and (3) chronological soundness. We design a hierarchical attention model that factorizes over three distributions, each collecting evidence from a different modality, followed by a pointer network that selects frames to include in the summary. DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence. We achieve state-of-the-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.

People

Publication Details

Date of publication: July 10, 2021

Conference: ACM Research and Development in Information Retrieval

Page number(s): 1389–1399

Volume:

Issue Number:

Publication Note: Safa Messaoud, Ismini Lourentzou, Assma Boughoula, Mona Zehni, Zhizhen Zhao, Chengxiang Zhai, Alexander G. Schwing: DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization. SIGIR 2021: 1389-1399