Toponym Matching Evaluation
When a historical newspaper mentions "Manchester," does it refer to Manchester, England, or one of the 30+ other Manchesters worldwide? Candidate selection narrows down which entities a recognized place name could plausibly refer to, a critical but often overlooked step before full entity resolution. This paper applies state-of-the-art neural networks to toponym matching, handling the substantial variation that makes place names so challenging: cross-lingual variations (München vs. Munich), regional differences (neighborhood names that don't appear in gazetteers), and OCR errors that corrupt spellings. The evaluation table shows F1 scores across these challenging scenarios in English and Spanish datasets. By improving candidate selection, the method enables more accurate downstream analysis of where historical texts are actually talking about, unlocking the geographic dimension of digitized archives.

Abstract

Recognizing toponyms and resolving them to their real-world referents is required to provide advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a previously recognized toponym. While it has traditionally received little attention, candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors) and assess its performance in the context of geographical candidate selection in English and Spanish.

Keywords: candidate selection, deep learning, fuzzy string matching, toponym matching, geographical entity resolution, neural networks