Dimitris Bilianos
National and Kapodistrian University of Athens

Comparing Named Entity Recognition in Classical and Modern Languages: Insights from Plutarch’s Life of Alexander

Keywords: NER; Ancient Greek; Corpus Annotation; NLP

This study explores the application of Named Entity Recognition (NER) to Plutarch’s Life of Alexander in its original Ancient Greek and an English translation. The goal is to assess how existing NER models process and categorize entities across two languages, and to identify key issues arising in cross-linguistic NER annotation. For this study, I employ three tools: the UGARIT Flair model trained on Ancient Greek, the standard Flair English model, and NLTK’s pre-trained English model.

After preprocessing the corpora -sentence splitting and tokenization- I extracted the top 50 most frequent entities identified by each system. A gold standard was constructed via manual annotation, assigning standard NER tags: PER (person), LOC (location), ORG (organization), and a custom GRP (group/ethnicity) tag for labels such as Macedonian and Greek. GRP entities were excluded from final evaluation due to inconsistent handling by the models, but are noted separately for further research.

The systems were evaluated on their agreement with the gold standard. The UGARIT Flair model showed 88% accuracy on the top-50 entities, the English Flair model 90%, and NLTK 76%. A key insight is that agreement between at least two out of three models occurred in 92% of cases, suggesting high convergence despite differences in training data and linguistic structure. This led to a proposed voting-based ensemble method: when at least two models agree on a tag, that annotation is selected. This method outperformed all individual models on the top-50 entity evaluation.

Notable examples of divergence include Delphi, tagged as LOC in Ancient Greek but ORG in both English models, and Apollo and Hercules, which the English Flair model tagged as ORG instead of PER. These instances illustrate how cultural and lexical ambiguity, especially in mythological and institutional references, pose challenges for cross-linguistic and diachronic NER tasks.

To further assess model behavior, a manually annotated gold standard was developed using the 50 most frequently extracted entities across all systems, which together accounted for over half of all detected instances. Each model's outputs were then compared against this reference. The English Flair model achieved the highest agreement, with only 5 mismatches, closely followed by the Ancient Greek Ugarit-Flair model (6 mismatches), while NLTK lagged behind with 12. Moreover, inter-model consensus was strong: in 92% of cases, at least two models agreed on the classification, underscoring the potential of ensemble methods. This evaluation informed the proposal of a voting-based ensemble approach as a practical solution for improving consistency across systems.

This work contributes to contrastive linguistics by demonstrating how annotation and tagging conventions vary across languages and models. It also highlights the value of cross-linguistic corpora in evaluating NLP tools and their linguistic assumptions. Future directions include expanding the gold standard, training a model incorporating a GRP tag, and applying ensemble approaches to improve NER performance in multilingual and low-resource settings, particularly for historical texts.

References

Bibliography

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media.

Johnson, K. P., Burns, P. J., Stewart, J., Cook, T., Besnier, C., & Mattingly, W. J. (2021). The Classical Language Toolkit: An NLP framework for pre-modern languages. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing: System demonstrations (pp. 20-29).

Palladino, C., & Yousef, T. (2024). Development of robust NER Models and Named Entity Tagsets for Ancient Greek. In R. Sprugnoli, & M. Passarotti (Eds.), 3rd Workshop on Language Technologies for Historical and Ancient Languages, LT4HALA 2024 at LREC-COLING 2024 - Workshop Proceedings (pp. 89–97). European Language Resources Association (ELRA). https://aclanthology.org/2024.lt4hala-1.11.

Yousef, T., Palladino, C., & Jänicke, S. (2022). Transformer-Based Named Entity Recognition for Ancient Greek. DOI:10.13140/RG.2.2.34846.61761.