Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance -- a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent component and the localization component, our method is able to take advantage of long-term temporal structures of the video data as well as rejecting outliers. We validate the proposed algorithm on three challenging benchmark datasets, the DAVIS-2016 dataset, the DAVIS-2017 dataset, and the Segtrack v2 dataset, achieving state-of-the-art performance on all of them.
- Date of publication:
- March 29, 2018
- Cornell University
- Publication note:
Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing: MaskRNN: Instance Level Video Object Segmentation. CoRR abs/1803.11187 (2018)