Back to portfolio
NIFify
Web tool for building and validating Entity Linking datasets
April, 2018
#entity-linking#javascript#nif#annotation#datasets
Overview
NIFify is a browser-based tool for the construction, curation and
validation of Entity Linking (EL) datasets in the
NIF
interchange format. EL datasets are notoriously inconsistent — different annotators
disagree on what counts as an entity, on the appropriate granularity of a mention,
and on whether a long span should be split into nested entities. NIFify was designed
to surface those disagreements explicitly and to make the annotation process auditable.
Technical design
-
Front-end. A single-page JavaScript application that runs entirely
in the browser, rendering documents as selectable text and overlaying mention
boundaries, candidate links and inter-annotator agreement signals.
-
Annotation model. Each document is loaded as a NIF context;
mentions are stored with character offsets, surface form, candidate URIs (from
DBpedia / Wikidata) and an optional "should-be-linked" decision following the
criteria of What should Entity Linking link?.
-
Validation pipeline. When a corpus is loaded, NIFify cross-checks
annotators' decisions and highlights conflicts: missing mentions, divergent link
targets, overlapping spans and inconsistent typing. Conflicts can be resolved in
place and re-exported.
-
Interoperability. Import and export use NIF/Turtle so the
datasets can be plugged directly into evaluation frameworks such as GERBIL or
consumed by NifWrapper-based experiments.
-
Multilingual support. Documents in English, Spanish and German
were curated with NIFify to produce the VoxEL benchmark, exercising the
tool against the practical issues of multilingual EL.
Research contributions
-
A workflow for building high-quality EL datasets that captures the rationale
behind each annotation, not only the final triples.
-
Empirical evidence — drawn from manual re-annotation of DBpedia Spotlight,
MSNBC-ACE2004 and other reference corpora — that widely used EL datasets contain
systematic disagreements that affect benchmark scores.
-
A reusable artefact for the EL community, complementing the more theoretical work
on fine-grained EL and on what should be linked at all.
Related publications
-
Rosales-Méndez, H.; Hogan, A.; Poblete, B.
NIFify: Towards Better Quality Entity Linking Datasets. LA-WEB 2019.
doi
-
Rosales-Méndez, H.; Hogan, A.; Poblete, B.
VoxEL: A Benchmark Dataset for Multilingual Entity Linking. ISWC 2018.
doi
-
Rosales-Méndez, H.; Poblete, B.; Hogan, A.
What should Entity Linking link? AMW 2018.
-
Rosales-Méndez, H.; Hogan, A.; Poblete, B.
Fine-Grained Entity Linking. Journal of Web Semantics, 2020.
Open NIFify