Virginia Tech® home

A Survey of Document-Level Information Extraction

Hanwen (Zoe) Zheng, Sijia Wang, Lifu Huang

Abstract

Document-level information extraction (IE) is a crucial task in natural language processing (NLP). This paper conducts a systematic review of recent document-level IE literature. In addition, we conduct a thorough error analysis with current state-of-the-art algorithms and identify their limitations as well as the remaining challenges for the task of document-level IE. According to our findings, labeling noises, entity coreference resolution, and lack of reasoning, severely affect the performance of document-level IE. The objective of this survey paper is to provide more insights and help NLP researchers to further enhance document-level IE performance.

Publication Details

Date of publication: September 22, 2023

Journal: arXiv

Page number(s):

Volume:

Issue Number:

Publication Note: Hanwen Zheng, Sijia Wang, Lifu Huang: A Survey of Document-Level Information Extraction. CoRR abs/2309.13249 (2023)