Chandan Reddy


Patients in hospitals, particularly in critical care, are susceptible to many complications affecting morbidity and mortality. Digitized clinical data in electronic medical records can be effectively used to develop machine learning models to identify patients at risk of complications early and provide prioritized care to prevent complications. However, clinical data from heterogeneous sources within hospitals pose significant modeling challenges. In particular, unstructured clinical notes are a valuable source of information containing regular assessments of the patient's condition but contain inconsistent abbreviations and lack the structure of formal documents. Our contributions in this paper are twofold. First, we present a new preprocessing technique for extracting features from informal clinical notes that can be used in a classification model to identify patients at risk of developing complications. Second, we explore the use of collective matrix factorization, a multi-view learning technique, to model heterogeneous clinical data-text-based features in combination with other measurements, such as clinical investigations, comorbidites, and demographic data. We present a detailed case study on postoperative respiratory failure using more than 700 patient records from the MIMIC II database. Our experiments demonstrate the efficacy of our preprocessing technique in extracting discriminatory features from clinical notes as well as the benefits of multi-view learning to combine clinical measurements with text data for predicting complications.


Chandan Reddy

Publication Details

Date of publication:
October 20, 2016
IEEE Access
Page number(s):