Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder

Khoa Doan, Chandan Reddy

Abstract

Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.

People

Bio Item

Khoa Doan , bio
Bio Item

Chandan Reddy , bio

Publication Details

Date of publication: April 19, 2020

Conference: ACM World Wide Web conference

Page number(s): 684-694

Volume:

Issue Number:

Publication Note: Khoa D. Doan, Chandan K. Reddy:Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder. WWW 2020: 684-694

Search Help

Search Tips

More search options

Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder

Abstract

People

Publication Details

Follow the Sanghani Center