
Keywords
event coreference, entity coreference, manual annotation tool, natural language processing
Abstract
A tool for the manual annotation of cross-document entity and event coreferences that helps annotators to label mention coreference relations in text is essential for the annotation of coreference corpora. To the best of our knowledge, CROss-document Main Events and entities Recognition (CROMER) is the only open-source manual annotation tool available for cross-document entity and event coreferences. However, CROMER lacks multi-language support and extensibility. Moreover, to label cross-document mention coreference relations, CROMER requires the support of another intra-document coreference annotation tool known as Content Annotation Tool, which is now unavailable. To address these problems, we introduce Cross-Document Coreference Annotation Tool (CDCAT), a new multi-language open-source manual annotation tool for cross-document entity and event coreference, which can handle different input/output formats, preprocessing functions, languages, and annotation systems. Using this new tool, annotators can label a reference relation with only two mouse clicks. Best practice analyses reveal that annotators can reach an annotation speed of 0.025 coreference relations per second on a corpus with a coreference density of 0.076 coreference relations per word. As the first multi-language open-source cross-document entity and event coreference annotation tool, CDCAT can theoretically achieve higher annotation efficiency than CROMER.
Recommended Citation
Xu, Yang; Xia, Boming; Wan, Yueliang; Zhang, Fan; Xu, Jiabo; and Ning, Huansheng
(2022)
"CDCAT: A Multi-Language Cross-Document Entity and Event Coreference Annotation Tool,"
Tsinghua Science and Technology: Vol. 27:
Iss.
3, Article 11.
DOI: https://doi.org/10.26599/TST.2020.9010060
Available at:
https://dc.tsinghuajournals.com/tsinghua-science-and-technology/vol27/iss3/11