Wei Wang, Yue Ning, Naren Ramakrishnan
State-of-the-art event encoding approaches rely on sentence or phrase level labeling, which are both time consuming and infeasible to extend to large scale text corpora and emerging domains. Using a multiple instance learning approach, we take advantage of the fact that while labels at the sentence level are difficult to obtain, they are relatively easy to gather at the document level. This enables us to view the problems of event detection and extraction in a unified manner. Using distributed representations of text, we develop a multiple instance formulation that simultaneously classifies news articles and extracts sentences indicative of events without any engineered features. We evaluate our model in its ability to detect news articles about civil unrest events (from Spanish text) across ten Latin American countries and identify the key sentences pertaining to these events. Our model, trained without annotated sentence labels, yields performance that is competitive with selected state-of-the-art models for event detection and sentence identification. Additionally, qualitative experimental results show that the extracted event-related sentences are informative and enhance various downstream applications such as article summarization, visualization, and event encoding.
- Date of publication:
- October 24, 2016
- ACM International Conference on Information Managment