DeNovo: virus-host sequence-based protein–protein interaction prediction – Sanghani Center for Artificial Intelligence and Data Analytics

Lenwood Heath

Abstract

Motivation Can we predict protein–protein interactions (PPIs) of a novel virus with its host? Three major problems arise: the lack of known PPIs for that virus to learn from, the cost of learning about its proteins and the sequence dissimilarity among viral families that makes most methods inapplicable or inefficient. We develop DeNovo, a sequence-based negative sampling and machine learning framework that learns from PPIs of different viruses to predict for a novel one, exploiting the shared host proteins. We tested DeNovo on PPIs from different domains to assess generalization. Results: By solving the challenge of generating less noisy negative interactions, DeNovo achieved accuracy up to 81 and 86% when predicting PPIs of viral proteins that have no and distant sequence similarity to the ones used for training, receptively. This result is comparable to the best achieved in single virus-host and intra-species PPI prediction cases. Thus, we can now predict PPIs for virtually any virus infecting human. DeNovo generalizes well; it achieved near optimal accuracy when tested on bacteria–human interactions.
Availability and implementation: Code, data and additional supplementary materials needed to reproduce this study are available.

People

Lenwood Heath

Professor of Computer Science

Publication Details

Date of publication:: April 20, 2016
Journal:: Bioinformatics
Page number(s):: 1144-1150
Volume:: 32
Issue Number:: 8