NIFify

Web tool for building and validating Entity Linking datasets

April, 2018 #entity-linking#javascript#nif#annotation#datasets

Overview

NIFify is a browser-based tool for the construction, curation and validation of Entity Linking (EL) datasets in the NIF interchange format. EL datasets are notoriously inconsistent — different annotators disagree on what counts as an entity, on the appropriate granularity of a mention, and on whether a long span should be split into nested entities. NIFify was designed to surface those disagreements explicitly and to make the annotation process auditable.

Technical design

Front-end. A single-page JavaScript application that runs entirely in the browser, rendering documents as selectable text and overlaying mention boundaries, candidate links and inter-annotator agreement signals.
Annotation model. Each document is loaded as a NIF context; mentions are stored with character offsets, surface form, candidate URIs (from DBpedia / Wikidata) and an optional "should-be-linked" decision following the criteria of What should Entity Linking link?.
Validation pipeline. When a corpus is loaded, NIFify cross-checks annotators' decisions and highlights conflicts: missing mentions, divergent link targets, overlapping spans and inconsistent typing. Conflicts can be resolved in place and re-exported.
Interoperability. Import and export use NIF/Turtle so the datasets can be plugged directly into evaluation frameworks such as GERBIL or consumed by NifWrapper-based experiments.
Multilingual support. Documents in English, Spanish and German were curated with NIFify to produce the VoxEL benchmark, exercising the tool against the practical issues of multilingual EL.

Research contributions

A workflow for building high-quality EL datasets that captures the rationale behind each annotation, not only the final triples.
Empirical evidence — drawn from manual re-annotation of DBpedia Spotlight, MSNBC-ACE2004 and other reference corpora — that widely used EL datasets contain systematic disagreements that affect benchmark scores.
A reusable artefact for the EL community, complementing the more theoretical work on fine-grained EL and on what should be linked at all.

Related publications

Rosales-Méndez, H.; Hogan, A.; Poblete, B. NIFify: Towards Better Quality Entity Linking Datasets. LA-WEB 2019. doi
Rosales-Méndez, H.; Hogan, A.; Poblete, B. VoxEL: A Benchmark Dataset for Multilingual Entity Linking. ISWC 2018. doi
Rosales-Méndez, H.; Poblete, B.; Hogan, A. What should Entity Linking link? AMW 2018.
Rosales-Méndez, H.; Hogan, A.; Poblete, B. Fine-Grained Entity Linking. Journal of Web Semantics, 2020.

Open NIFify