Evaluation of Arabic Named Entity Recognition Models on Sahih Al-Bukhari Text

EasyChair Preprint 9573, version 2

Versions: 12→history

8 pages•Date: January 16, 2023

Ibtisam Khalaf Alshammari, Eric Atwell and Mohammad Ammar Alsalka

Abstract

In this paper, four Arabic named entity recognition models were applied to the Sahih Al-Bukhari dataset (CAMeLBERT-CA, Hatmimoha, Marefa-NER, and Stanza). The main aim of this study is to find the best performance of the mentioned tools to be used in other Hadith datasets. Stanza and Marefa-NER models are the best because they obtained 0.826191 and 0.807396 in the F1-score, respectively. A new test dataset was created of around 5000 words based on the CANERCorpus annotation. Then, evaluated all the previous models to the new test dataset and found disappointing scores for all the models in the F1-score although Hatmimoha has the best result. This problem has probably arisen since the dataset is small. However, we observed that the model has many named entity classes and matches the CANERCorpus labels that could obtain a high performance such as Hatmimoha and Marefa-NER models.

Keyphrases: Arabic NER Models, CANERCorpus Annotation, Models Evaluation, Sahih Al-Bukhari

Links:

https://easychair.org/publications/preprint/FBHJ

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:9573,
  author    = {Ibtisam Khalaf Alshammari and Eric Atwell and Mohammad Ammar Alsalka},
  title     = {Evaluation of Arabic Named Entity Recognition Models on Sahih Al-Bukhari Text},
  howpublished = {EasyChair Preprint 9573},
  year      = {EasyChair, 2023}}

Download PDF Open PDF in browser