Our project,

An Epidemiology of Information , examines the transmission of disease-­related information about the “Spanish flu,” using digitized newspaper collections available to the public from the Chronicling America collection hosted by the Library of Congress. We rely primarily on two text mining methods: (1) segmentation via topic modeling and (2) tone classification. Although most historical accounts of the Spanish flu make extensive use of newspapers, our project is the first to ask how looking at these texts as a large data source can contribute to historical understanding of this event while also providing humanities scholars, information scientists, and epidemiologists with new tools and insights. Our findings indicate that topic modeling is most useful for identifying broad patterns in the reporting on disease, while tone classification can identify the meanings available from these reports. Read more.