+++ title = “Monolingual NER Results for various Languages” date = 2019-02-04T22:34:35+05:30 subtitle = “” summary = “” authors = [] tags = [‘named entity recognition’, ‘Indian Languages’, ‘European Languages’] categories = [‘named entity recognition’, ‘Indian Languages’, ‘European Languages’] featured = false draft = false

List format.

0 = Simple

1 = Detailed

2 = Stream

list_format = 2

Optional featured image (relative to `static/img/` folder).

[header] image = “” caption = “” discussionId = 2 +++

The Neural NER system implemented by me as part of the papers TALLIP paper and ACL 2018 Paper achieves the following F1-Scores on various languages.

Results

Language	Dataset	Word Embeddings	Reference	F1 Score
English	CoNLL 2003	Spectral Embeddings	Arxiv Paper	90.94
Spanish	CoNLL 2002	Spectral Embeddings	Arxiv Paper	85.75
Dutch	CoNLL 2002	Spectral Embeddings	Arxiv Paper	85.20
German	Link	Spectral Embeddings	ACL 2018 Paper	87.64
Italian	Evalita 2009	Spectral Embeddings	ACL 2018 Paper	75.98
Hindi	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	64.93
Marathi	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	61.46
Bengali	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	55.61
Malayalam	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	64.59
Tamil	FIRE 2014	Fasttext Embeddings	ACL 2018 Paper	65.39

PPS: The reason for difference in monolingual NER performance for Bengali, Tamil and Malayalam compared to the published results are due to certain pre-processing steps which were not performed in the ACL 2018 paper. We have observed that some of the sentences have length greater than 200 words. Manually splitting these longer sentences into smaller ones using ‘