Xuchao Zhang, Shuo Lei, Liang Zhao, Arnold Boediardjo

Abstract

The presence of data noise and corruptions has recently invoked increasing attention on robust least-squares regression (RLSR), which addresses this fundamental problem that learns reliable regression coefficients when response variables can be arbitrarily corrupted. Until now, the following important challenges could not be handled concurrently: (1) rigorous recovery guarantee of regression coefficients, (2) difficulty in estimating the corruption ratio parameter, and (3) scaling to massive datasets. This article proposes a novel Robust regression algorithm via Heuristic Corruption Thresholding (RHCT) that concurrently addresses all the above challenges. Specifically, the algorithm alternately optimizes the regression coefficients and estimates the optimal uncorrupted set via heuristic thresholding without a pre-defined corruption ratio parameter until its convergence. Moreover, to improve the efficiency of corruption estimation in large-scale data, a Robust regression algorithm via Adaptive Corruption Thresholding (RACT) is proposed to determine the size of the uncorrupted set in a novel adaptive search method without iterating data samples exhaustively. In addition, we prove that our algorithms benefit from strong guarantees analogous to those of state-of-the-art methods in terms of convergence rates and recovery guarantees. Extensive experiments demonstrate that the effectiveness of our new methods is superior to that of existing methods in the recovery of both regression coefficients and uncorrupted sets, with very competitive efficiency.

People

Shuo Lei


Liang Zhao


Publication Details

Date of publication:
June 7, 2019
Journal:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Page number(s):
1-22
Volume:
13
Issue Number:
3
Publication note:

Xuchao Zhang, Shuo Lei, Liang Zhao, Arnold P. Boedihardjo, Chang-Tien Lu: Robust Regression via Heuristic Corruption Thresholding and Its Adaptive Estimation Variation. ACM Trans. Knowl. Discov. Data 13(3): 28:1-28:22 (2019)